AD- A 157  294 


4 


Final  Report 

Computerized  Adaptive  Measurement 
of  Achievement  and  Ability 


David  J.  Weiss 


June  1985 


Computerized  Adaptive  Testing  Laboratory 
Deparment  of  Psychology 
University  of  Minnesota 
Minneapolis  MN  55455 


Final  Report  of  Project  NR150-433,  N00014-79-C-0172 


Supported  by  the 
Office  of  Naval  Research 
Air  Force  Human  Resources  Laboratory 
Air  Force  Office  of  Scientific  Research 
Army  Research  Institute 


DTIC 

^ILECTE 
JUL1  9  885 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 
REPRODUCTION  IN  WHOLE  OR  IN  PART  IS  PERMITTED  FOR 
ANY  PURPOSE  OF  THE  UNITED  STATES  GOVERNMENT* 


85  7  08  110 


security  classification  of  this  page  D*r«  Entered) 


REPORT  DOCUMENTATION  PAGE 


M  Rt  PORT  NUMBER 


I*  TiTlE  ‘'end  Subtitle) 


12  GO  VTk  ACCESSION 

-  } 


Final  Report: 

Computer i zed  Adaptive  Measurement 
of  Achievement  and  Ability 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 
catalog  number 


5  type  OF  REPORT  A  PERIOD  COVERED 

Final  Report 

Februarv  197°  -  April  1983 

6  PERFORMING  ORG.  REPORT  NUMBER 


I  7  Ay  TnORf ») 


8.  CONTRACT  OR  GRANT  NUMBERS 


David  J .  Weiss 


N00014-79-C-01 72 


PERFORMING  ORGANIZATION  name  and  address 

Department  of  Psychology 
Fniversitv  of  Minnesota 
Minneapolis,  MN  55455 

CONTROLLING  OFFICE  NAME  AND  ADDRESS 

Personnel  and  Training  Research  Programs 
Office  of  Naval  Research 
Arlington,  VA  22217 


10.  PROGRAM  ELEMENT.  PROJECT.  TASK 
AREA  8  WORK  UNIT  NUMBERS 

PE:  615534  Proi:  RR042-04 
TA:  RR042-04-01  TO:  NR150-433 

12.  REPORT  DATE 

June  1Q85 

13.  NUMBER  of  PAGES 

20 


14  MONITORING  AGENCY  name  A  ADDRESS^//  different  from  Controlling  Office)  15.  SECURITY  CLASS,  'of  thin  report) 

Dnclassif led 

IS*.  DECLASS!  FI  CATION^  DOWN  GRADING 
SCHEDULE 

"t6"  DlSTRIBuTiON  STATEMENT  (of  thle  Report) 

Approval  for  public  release;  distribution  unlimited.  Reproduction  in  whole 
or  in  part  is  permitted  for  any  purpose  of  the  United  States  Government. 


M7  DISTRIBUTION  STATEMENT  (of  the  abatrect  entered  In  Block  20,  If  differ  mi?  from  Report) 


18  SUPPLEMENTARY  NOTES 

This  research  was  supported  by  funds  from  the  Office  of  Naval  Research,  Air 
Force  Human  Resources  Laboratory,  Air  Force  Office  of  Scientific  Research, 
and  Army  Research  Institute,  and  monitored  bv  the  Office  of  Naval  Research. 

19  KEY  WORDS  (Continue  on  reveree  aide  if  naceaaery  end  identify  by  block  number) 


20  ABSTRACT  (Continue  on  reveree  aide  If  neceaaery  and  Identity  by  block  number) 

The  research  program's  objectives  are  described,  and  the  research  approach 
is  summarized  and  related  to  the  ten  technical  reports  and  other  project 
publications.  Thirteen  major  research  findings  are  presented.  Abstracts 
of  the  _en  technical  reports  are  also  included. 


COITION  or  1  NOV  88  if  OBSOLETE 

S  'N  01 02- LF -014.6601 


Unclassified 

SECURITY  CLASSIFICATION  OF  THIS  PAGE  (When  Date  Bntered) 


CONTENTS 


Objectives  . . . . . . . . .  I 

Adaptive  Achivement  Testing  *•*«•••«••* . 1 

Adaptive  Ability  Testing  1 

Approach  •••••« . . . . . . .  1 

Adaptive  Achivement  Testing  . . •#•••••••••• .  1 

Intersubtest  Branching  •  . . . . .  1 

The  Dimensionality  of  Measured  Achivement  Over  Time  . . .  2 

Adaptive  Mastery  Testing  •  . . . . .  3 

Adaptive  Self-Referenced  Testing  •%•••••« . 3 

Adaptive  Ability  Testing  . . . . . .  A 

Adaptive  Testing  Strategies  . . . . .  A 

Response  Modes,  Test  Item  Formats,  and  Effects  of  Test 

Administration  Variables  •••••«••••••» .  6 


Major  Findings  . . . . .  7 

Adaptive  Achivement  Testing  ••*•••••«*•• . . . .  7 

Adaptive  Ability  Testing . . .  8 

Abstracts  of  Research  Reports . . . • . .  11 

79- 6*  Efficiency  of  an  Adaptive  Inter-Subtest  Branching  Strategy 

in  the  Measurement  of  Classroom  Achivement  »••»••••••••••••  11 

80- A*  A  Comparison  of  Adaptive,  Sequential,  and  Conventional 

Testing  Strategies  for  Mastery  Decisions  *••«•*»• .  12 

80- 5*  An  Alternate-Forms  Reliability  and  Concurrent  Validity 

Comparison  of  Bayesian  Adaptive  and  Conventional  Ability 
Tests  . . . . . . . .  12 

81- 1*  Review  of  Test  Theory  and  Methods  . . . .  13 

81-2*  Effects  of  Immediate  Feedback  and  Pacing  of  Item 

Presentation  on  Ability  Test  Performance  and  Psychological 

Reactions  to  Testing . . . . . .  1A 

81-3*  A  Validity  Comparison  of  Adaptive  and  Conventional 

Strategies  for  Mastery  Testing . * . . . * .  15 

81-A.  Factors  Influencing  the  Psychometric  Characteristics  of  an 

Adaptive  Testing  Strategy  for  Test  Batteries  . . .  16 

81-5*  Dimensionality  of  Measured  Achievement  Over  Time  . .  16 

83-2*  Bias  and  Information  of  Bayesian  Adaptive  Testing  * .  17 

83-3*  Effect  of  Examinee  Certainty  on  Probabilistic  Test  Scores 
and  a  Comparison  of  Scoring  Methods  for  Probabilistic 
Responses  . . . .  17 


FINAL  REPORT 

Computerized  Adaptive  Measurement  of  Achievement  and  Ability 


Objectives 

This  research  program  was  designed  to  investigate  the  applications  of  item 
response  theory  (IRT)  and  computerized  adaptive  testing  to  the  unique  problems 
of  the  measurement  of  ability  and  the  measurement  of  achievement.  Specific  ob¬ 
jectives  relevant  to  these  two  areas  were  as  follows: 


Adaptive  Achievement  Testing 


To  study  the  relative  efficiency  of  various  approaches  to  intersubtest 
branching  in  achievement  test  batteries. 

J 

\ 

21  To  investigate  the  dimensionality  of  measured  achievement  over  time. 


32 


0 


To  study  the  applicability  of  IRT  models  to  the  problem  of  mastery  testing 
and  to  compare  models  for  adaptive  mastery  testing  with  other  approaches  to 
the  improvement  of  mastery  decisions  and/or  reduction  in  test  length  in  mas¬ 
tery  testing. 

J 

To  explicate  the  concept  of  Adaptive  Self-Referenced  Testing  and  to  examine 
its  applicability  to  the  achievement  testing  problem,  v 

J  \ 


Adaptive  Ability  Testing 


5*  To  evaluate  the  performance  of  adaptive  testing  strategies  under  conditions 
which  more  reasonably  represent  the  conditions  under  which  these  strategies 
might  be  used,  and  to  examine  the  performance  of  adaptive  testing  strategies 
in  live  testing.  / 

J 

6a  To  evaluate  the  utility  for  adaptive  testing  of  response  modes  and  test  item 
formats  usable  in  adaptive  ability  testing. 

Research  in  pursuance  of  these  objectives  began  in  February  1979  and  continued 
through  April  1983. 


Approach 

The  research  utilized  a  combination  of  monte  carlo  simulation  studies  and 
live-testing  studies.  ^ 

Adaptive  Achievement  Testing 

Intersubtest  branching.  Intersubtest  branching  is  an  approach  to  the  uti¬ 
lization  of  adaptive  testing  methodologies  in  a  multidimensional  item  pool.  In 
intersubtest  branching,  IRT  item  parameters  are  estimated  separately  for  each 
subtest  of  a  multisubtest  battery.  Using  any  of  a  number  of  adaptive  testing 
strategies,  adaptive  testing  occurs  within  the  subtest  based  on  appropriate  item 
selection  rules  and  a  test  termination  criterion  appropriate  for  the  purpose  of 


\ 


-  2  - 


testing.  Upon  completion  of  a  subtest  in  the  test  battery,  the  final  trait  lev¬ 
el  estimate  (§)  is  then  used  as  an  entry  point  to  begin  testing  in  a  subsequent 
subtest  in  the  battery.  As  originally  proposed,  subtests  in  a  battery  are  or¬ 
dered  by  the  magnitudes  of  the  squared  multiple  correlations  of  each  subtest 
with  all  other  subtests  in  the  battery.  In  this  way,  the  entry  points  for  adap¬ 
tive  testing  in  each  subtest  utilize  the  information  available  in  the  tests  in 
the  test  battery  that  were  most  highly  correlated  with  it,  which  should  shorten 
the  adaptive  tests  for  later  subtests  in  the  battery  as  much  as  possible. 

Intersubtest  adaptive  branching  was  studied  by  real-data  simulation  in  Re¬ 
search  Report  79-6,  and  by  monte  carlo  simulation  in  Research  Report  80-4.  The 
study  reported  in  Research  Report  79-6  used  data  from  conventionally-administer¬ 
ed  tests  which  were  analyzed  as  if  they  had  been  administered  as  an  adaptive 
test,  and  the  intersubtest  branching  strategy  was  applied  to  these  data.  This 
study  was  designed  to  separate  the  effects  of  the  adaptive  intrasubtest  item 
selection  procedure  from  those  effects  due  to  intersubtest  branching.  The  study 
also  (1)  allowed  evaluation  of  the  effects  of  different  intrasubtest  termination 
criteria,  (2)  investigated  the  effect  of  taking  into  account  errors  of  measure¬ 
ment  in  the  multiple  regression  procedure  used  to  determine  test  entry  points, 
and  (3)  investigated  the  stability  of  the  regression  equations  in  cross-valida- 
t  ion . 


Other  aspects  of  the  intersubtest  branching  strategy  when  applied  to  an 
achievement  test  battery  were  investigated  by  monte  carlo  simulation  in  Research 
Report  81-4.  Questions  of  interest  in  this  study  included  (1)  the  effects  of 
varying  subtest  order,  (2)  the  utilization  of  different  subtest  termination  cri¬ 
teria,  and  (3)  the  effect  of  variable  versus  fixed  entry  on  the  psychometric 
properties  of  the  intersubtest  branching  strategy.  Dependent  variables  included 
(l)  reductions  in  test  length,  (2)  effect  on  test  information,  and  (3)  correla¬ 
tions  between  achievement  estimates  and  true  achievement  levels.  The  study  de¬ 
sign  also  permitted  separation  of  the  effects  of  intrasubtest  and  intersubtest 
adaptive  branching. 

The  dimensionality  of  measured  achievement  over  time.  The  effects  of  in¬ 
struction  on  measured  achievement  are  usually  measured  at  a  single  point  in 
time.  That  is,  some  instruction  is  given  to  an  individual  and  at  the  end  of  the 
period  of  instruction  an  achievement  test  is  used  to  determine  whether  the  indi¬ 
vidual  has  reached  an  appropriate  level  of  achievement.  On  the  basis  of  such 
information,  aggregated  across  individuals,  decisions  are  frequently  made  about 
the  adequacy  of  instructional  programs,  or  about  the  impact  (or  lack  thereof)  of 
instruction  on  a  specific  individual. 

A  more  powerful  approach  to  the  measurement  of  achievement  would  involve 
the  use  of  pretests  and  posttests  to  determine  if  any  change  has  occurred  in 
measured  achievement  over  time.  Using  change  scores,  however,  implies  that  the 
variable  being  measured  is  the  same  at  pretest  as  it  is  at  posttest.  There  has 
been  very  little  empirical  data  available  concerning  this  issue. 

Research  Report  81-5  was  designed  to  investigate  the  question  of  whether 
the  achievement  factor  identified  at  pretest  in  an  achievement  test  is  the  same 
factor  identified  at  posttest.  Two  studies  utilized  data  on  groups  of  college 
students  from  measured  achievement  in  mathematics  classes  and  biology  classes. 


f 


Achievement  test  item  responses  were  factor  analyzed  prior  to  instruction,  and 
again  at  the  end  of  instruction*  In  addition,  mean  differences  in  test  scores 
at  pretest  and  posttest  were  analyzed*  Factors  obtained  at  pretest  were  com¬ 
pared  with  those  obtained  at  posttest  to  determine  if  the  same  factor  was  found 
prior  to  and  after  instruction. 

Kingsbury  (1984)  directly  examined  the  characteristics  of  change  scores 
derived  from  adaptive  and  conventional  tests.  This  study  utilized  data  from 
college-level  biology  examinations*  Both  adaptive  and  conventional  tests  were 
administered  in  a  complex  design  to  groups  of  students  in  such  a  way  that  relia¬ 
bilities  of  the  change  scores  could  be  determined  separately  for  the  two  types 
of  tests  in  a  number  of  different  homogeneous  content  areas  covered  in  the 
course.  The  question  raised  by  this  study  was  based  on  hypotheses  that  the  more 
precise  achievement  level  estimates  resulting  from  adaptive  testing  should  also 
result  in  more  reliable  change  scores  in  comparison  to  those  from  conventional 
testing.  Also  studied  was  the  effect  of  variable-  versus  fixed-length  test  ter¬ 
mination  on  the  adaptive  tests. 

Adaptive  Mastery  Testing*  Adaptive  mastery  testing  (AMT)  combines  IRT  and 
adaptive  testing  into  an  efficient  strategy  for  making  mastery  or  classification 
decisions.  In  this  procedure,  items  used  to  make  a  mastery  decision  are  select¬ 
ed  by  an  IRT  maximum  information  adaptive  testing  strategy.  Item  responses  are 
scored  using  a  Bayesian  0  estimation  procedure,  and  a  confidence  or  credibility 
interval  is  computed  for  the  0  estimate.  The  confidence  Interval  around  the 
estimate  is  then  compared  to  a  mastery  cutoff  score,  which  is  also  expressed  on 
the  0  metric.  A  mastery  decision  is  determined  on  the  basis  of  whether  the 
credibility  interval  overlaps  with  the  mastery  criterion  level,  and  on  which 
side  of  the  mastery  cutoff  score  the  individual’s  0  estimate  falls. 

Both  monte  carlo  simulation  and  live  testing  were  used  to  investigate  char¬ 
acteristics  of  the  AMT  strategy  and  to  compare  it  with  other  approaches  for  mak¬ 
ing  mastery  decisions.  In  Research  Report  80-4  (also  Kingsbury  &  Weiss,  1980a) 
the  AMT  procedure  was  compared  to  a  conventionally-based  mastery  testing  proce¬ 
dure  and  to  a  procedure  based  on  Wald’s  sequential  probability  ratio  test.  The 
procedures  were  compared  in  terms  of  their  efficiency,  based  on  the  test  length 
required  by  the  procedures  to  make  a  classification  decision,  on  the  validity  of 
the  decisions  made  by  each  procedure,  and  on  the  type  of  classifications  made  by 
each  of  the  three  testing  procedures. 

To  examine  the  generality  of  the  findings  in  live  testing,  in  Research  Re¬ 
port  81-3  the  AMT  procedure  and  a  conventional  test  were  administered  to  stu¬ 
dents  in  a  biology  class.  Contrary  to  earlier  studies  which  examined  the  AMT 
procedure,  actual  adaptive  mastery  tests  were  administered  to  one  subgroup  of 
students  while  the  other  received  computer-administered  conventional  tests.  The 
performance  of  the  two  testing  strategies  was  evaluated  in  terms  of  a  mastery 
criterion  based  on  the  students'  final  s rending  in  the  course,  which  was  a  com¬ 
bination  of  their  performance  on  course  examinations  and  laboratory  grades. 

Adaptive  Self-Referenced  Testing.  Adaptive  self-referenced  testing  (ASRT) 
is  a  combination  of  IRT  and  adaptive  testing  designed  to  permit  the  efficient 
measurement  of  changes  in  achievement  levels  due  to  exposure  to  instruction. 

This  procedure  is  designed  to  measure  individual  changes  in  achievement  in  a 


unidimensional  item  pool  in  a  very  efficient  manner  at  a  number  of  points  of 
instruction*  It  is  thus  an  appropriate  conceptualization  for  tracking  individu¬ 
al  changes  due  to  instruction  at  a  number  of  points  during  a  course,  since  it 
permits  an  instructor  to  evaluate  an  individuals  performance  on  a  minimum  num¬ 
ber  of  items  at  each  of  a  number  of  testing  occasions. 

ASRT  permits  an  instructor  to  measure  a  student  early  in  a  course,  such  as 
on  the  first  day,  and  as  frequently  as  is  necessary  during  the  course.  Based  on 
adaptive  testing  methodology,  the  data  obtained  from  the  Time  1  testing  are  used 
as  the  entry  point  to  Time  2  adaptive  test  administration,  and  this  process  is 
followed  for  any  number  of  test  administrations.  In  addition,  test  termination 
at  any  point  in  time  can  be  based  on  the  standard  error  band  associated  with  an 
individuals  6  estimate  at  that  point  in  time.  ASRT  is  designed  to  simultane¬ 
ously  permit  intraindividual  measurement  of  change,  norm-based  measurement  on 
the  0  metric  which  can  then  be  converted  to  the  proportion-correct  measurement 
if  desired,  and  a  mastery-based  (criterion-referenced)  achievement  level  esti¬ 
mate  utilizing  the  procedures  of  AMT.  While  no  research  directly  related  to 
ASRT  was  done  during  the  contract  period,  the  method  was  described  in  some  de¬ 
tail  in  Weiss  &  Kingsbury  (1984).  Both  Research  Report  81-5  and  the  Kingsbury 
(1984)  study  have  implications  for  the  use  of  ASRT  and  its  future  development. 

Adaptive  Ability  Testing 

Adaptive  testing  strategies.  A  major  focus  of  this  research  program  was  on 
the  evaluation  of  different  approaches  to  computerized  adaptive  testing.  While 
earlier  projects  were  concerned  primarily  with  evaluating  the  relative  perfor¬ 
mance  of  adaptive  and  conventional  testing  strategies,  in  this  project  the  focus 
was  on  the  IRT-based  strategies  and  on  their  performance  under  a  variety  of  con¬ 
ditions.  An  overview  of  some  aspects  of  project  research  is  given  by  Weiss 
(1982). 

The  performance  of  a  Bayesian  adaptive  testing  strategy  was  reported  in 
Research  Report  83-2  (also  Weiss  &  McBride,  1984).  Owen’s  Bayesian  adaptive 
testing  strategy  was  examined  in  three  studies  which  utilized  an  accurate  prior 
0  estimate,  a  constant  prior  Q  estimate  with  fixed  test  length,  and  a  constant 
prior  6  estimate  with  variable  test  length.  The  performance  of  the  adaptive 
testing  strategy  was  examined  in  terms  of  the  bias  and  information  of  the  0  es¬ 
timates  as  a  function  of  0.  Also  examined  was  the  mean  number  of  items  adminis¬ 
tered  in  the  variable  test  length  condition. 

A  major  concern  of  the  research  was  to  evaluate  the  performance  of  adaptive 
testing  strategies  under  conditions  of  increasing  realisticness.  Prior  to  these 
studies,  all  studies  evaluating  the  performance  of  adaptive  testing  strategies 
did  so  under  reasonably  unrealistic  conditions.  While  characteristics  of  the 
item  pools  varied  in  these  earlier  studies,  the  IRT  item  parameters  used  in 
these  simulation  studies  were  considered  to  be  accurate.  However,  in  real  item 
pools,  there  is  always  some  error  associated  with  the  item  parameter  estimates. 
Since  adaptive  testing  is  designed  to  select  items  on  the  basis  of  these  item 
parameter  estimates,  it  can  be  assumed  that  any  degree  of  inaccuracy  in  the  item 
parameter  estimates  will  have  detrimental  effects  on  the  performance  of  adaptive 
testing  strategies. 


\ 


-  5  - 


Consequently,  two  studies  were  designed  to  investigate  effects  of  errors  in 
item  parameter  estimates  on  the  performance  ot  maximum  information  and  Bayesian 
adaptive  testing  strategies.  The  first  study  (Crichton,  1981)  assessed  the 
effects  of  errors  in  item  parameter  estimates  in  the  context  of  the  three-param¬ 
eter  logistic  model.  Crichton  compared  the  performance  of  the  two  IRT-based 
adaptive  testing  strategies — maximum  information  and  Bayesian — with  the  strati¬ 
fied  adaptive  ( s tr adapt ive )  strategy,  on  the  hypothesis  that  the  stradaptive 
strategy  should  be  less  sensitive  to  errors  in  the  item  parameter  estimates. 

Her  monte  carlo  simulation  study  varied  test  length  from  5  to  30  items.  Test 
length  was  then  crossed  with  three  levels  of  error  in  the  discrimination  (a) 
parameter,  four  levels  of  error  in  estimates  for  the  difficulty  (_b)  parameter, 
and  two  levels  of  error  in  the  pseudo-guessing  (£)  parameter.  In  addition  to 
considering  these  effects  for  a_,  _b,  and  £  separately,  two  datasets  examined  the 
effects  of  joint  errors  in  the  £,  _b,  and  c_  parameters.  Dependent  variables  con¬ 
ditional  on  0  included  the  bias,  root  mean  square  error,  inaccuracy,  and  infor¬ 
mation  in  the  0  estimates,  and  the  correlation  of  0  and  §• 

Mattson  (1983)  also  examined  the  performance  of  adaptive  testing  strategies 
under  conditions  of  error  in  item  parameter  estimates,  using  monte  carlo  simula¬ 
tion.  Mattson  extended  the  Crichton  study  by  studying  similar  effects  in  the 
one-  and  two-parameter  logistic  models,  in  addition  to  the  three-parameter  mod¬ 
el.  Whereas  Crichton  limited  her  trait  level  estimation  to  maximum  likelihood 
scoring  of  the  response  vectors,  Mattson  also  included  Bayesian  scoring  of  the 
maximum  information  and  Bayesian  adaptive  tests.  In  addition,  Mattson  allowed 
the  level  of  correlation  between  the  _a  and  _b  parameters  to  vary  at  four  differ¬ 
ent  levels,  as  well  as  examining  the  uncorrelated  condition  used  by  Crichton. 
Similar  to  Crichton,  Mattson  also  varied  test  length  from  10  to  30  items.  Fi¬ 
nally,  Mattson  allowed  errors  in  the  £  parameter  to  vary  at  two  levels,  examined 
four  levels  of  error  in  b_f  and  one  level  of  error  in  c.  All  conditions  were 
crossed  with  each  other.  Mattson’s  dependent  variables  were  the  same  as  those 
studied  by  Crichton. 

A  second  factor  that  can  affect  the  performance  of  adaptive  testing  strate¬ 
gies  in  a  realistic  item  pool  is  the  dimensionality  of  the  item  pool.  Since  all 
IRT  models  assume  a  unidimensional  item  pool,  deviations  from  unidimensionality 
would  be  expected  to  affect  the  performance  of  adaptive  testing  strategies  in 
real  item  pools,  which  are  rarely  (if  ever)  strictly  unidimensional.  As  a  re¬ 
sult  Suhadolnik  and  Weiss  (1985)  examined  the  robustness  of  adaptive  testing  to 
mult Idimens  tonality. 

In  this  study,  the  maximum  information  adaptive  testing  strategy  using  max¬ 
imum  liklihood  scoring  was  applied  to  datasets  varying  from  strictly  unidimen¬ 
sional  to  four-factor  datasets  that  reflected  the  structure  of  the  most  multidi¬ 
mensional  subtest  of  the  Armed  Service  Vocational  Aptitude  Battery.  Between 
these  extremes  were  two-  and  three-factor  datasets  In  which  the  second  and  third 
factors  accounted  for  varying  proportions  of  variance  in  comparison  to  the  first 
factor,  thus  simulating  item  structures  varying  from  very  little  multidimension¬ 
ality,  to  a  very  high  degree  of  multidimensionality.  A  total  of  45  data  struc¬ 
tures  was  examined. 

To  evaluate  the  effects  of  multidimensionality,  dichotomous  item  responses 
were  simulated  from  the  specified  multidimensional  structures.  These  item  re- 


/ 


-  6  - 


sponses  were  then  treated  as  if  they  were  derived  from  a  unidimensonal  model, 
and  adaptive  testing  was  implemented  using  the  item  response  vectors.  To  evalu¬ 
ate  the  performance  of  the  maximum  information  adaptive  testing  strategy  under 
multidimensionality,  the  conditional  bias,  inaccuracy,  and  root  mean  square  er¬ 
ror  of  the  9  estimates  was  computed  relative  to  the  true  first  factor  0  from  the 
multidimensional  structure. 

Response  modes,  test  item  formats,  and  effects  of  test  administration  vari¬ 
ables.  The  administration  of  ability  tests  by  interactive  computers  allows  the 
use  of  item  types  that  transcend  the  typical  dichotomously-scored  multiple- 
choice  test  item.  Research  Report  83-3  examined  aspects  of  a  probabilistic  re¬ 
sponse  mode  used  in  conjunction  with  the  typical  multiple-choice  item  format. 
This  response  mode  was  chosen  as  one  means  of  extracting  additional  information 
from  a  multiple-choice  item,  rather  than  simply  requiring  a  choice  of  a  single 
response  alternative. 

A  major  problem  with  probabilistic  responding  to  multiple-choice  items  in 
conventional  paper-and-penc il  test  administration  is  that  examinees  do  not  al¬ 
ways  follow  the  instructions  carefully  so  that  the  probabilities  they  assign  to 
the  item  responses  does  not  always  sum  to  1.00.  As  a  consequence,  large  amounts 
of  data  might  be  lost  for  a  given  examinee.  When  multiple-choice  items  are  an¬ 
swered  in  a  probabilistic  mode  on  a  computer  terminal,  however,  the  validity  of 
the  distribution  of  the  probabilities  can  be  checked  immediately  for  each  indi¬ 
viduals  responses  to  each  test  item,  and  invalid  responses  can  be  adjusted  un¬ 
til  they  meet  the  appropriate  criteria. 

The  utility  of  the  probabilistic  response  mode  was  examined  first  by  com¬ 
paring  the  usefulness  of  different  scoring  formulas  associated  with  the  response 
mode.  Then,  the  factor  structure  resulting  from  the  pr obabi i istic  response  mode 
was  studied  in  comparison  to  the  factor  structure  obtained  from  scoring  the  re¬ 
sponses  dichotomously.  Also  examined  in  Research  Report  83-3  were  the  validi¬ 
ties  of  the  scores  obtained  from  the  different  scoring  methods,  their  reliabili¬ 
ties,  and  the  effects  of  certainty  or  risk-taking  on  the  probabilistic  scores. 

Thompson* s  (1983)  study  also  involved  the  administration  of  items  in  dif¬ 
ferent  response  formats  to  college  students.  The  study  crossed  two  response 
formats  (categorical  and  probabilistic)  with  two  item  types  (multiple-choice  and 
dichotomous)  to  obtain  four  different  types  of  test  items.  These  were  (1)  the 
conventional  multiple-choice  item;  (2)  a  probabilistic  multiple-choice  item, 
similar  to  that  used  in  Research  Report  83-3;  (3)  a  dichotomous  (yes,  no)  item; 
and  (4)  a  dichotomous-probabilistic  item  in  which  an  examinee  answered  by  stat¬ 
ing,  with  a  number  between  0  and  100,  his/her  confidence  that  the  answer  to  the 
question  was  the  correct  answer.  Similar  to  Research  Report  83-3,  Thompson  in¬ 
vestigated  several  scoring  systems  for  the  probabilistic  items.  In  addition, 
the  four  test  item  types  were  evaluated  in  terms  of  the  intercorre lations  of  the 
scores  they  provided,  their  reliabilities,  and  their  factor  structures. 

One  other  factor  related  to  adaptive  testing  examined  in  this  project  con¬ 
cerned  the  effects  of  test  administration  variables  on  ability  test  performance 
and  psychological  reactions  to  testing.  This  study  (Research  Report  81-2)  in¬ 
vestigated  the  effects  of  two  variables  unique  to  computer  administration.  One 
variable  was  immediate  knowledge  of  results  of  the  correctness  of  each  item  re- 


7 


sponse  daring  the  process  of  test  administration*  The  second  variable — pacing 
of  item  presentation — was  concerned  with  whether  the  pace  of  the  test  adminis¬ 
tration  was  controlled  by  the  examinee  or  by  the  computer.  The  two  variables 
were  studied  in  both  computer-administered  conventional  and  adaptive  tests.  The 
dependent  variables  included  ability  test  performance  (maximum  likelihood  '  es¬ 
timates  and  proportion  correct),  response  pattern  information,  item  response 
latencies,  and  psychological  reactions  to  testing.  Data  were  obtained  from  477 
college  students  who  were  randomly  assigned  to  the  experimental  conditions. 

Major  Findings 


Adaptive  Achievement  Testing 

1.  Adaptive  intersubtest  branching  is  a  feasible  approach  to  improving  the 
efficiency  of  test  administration  when  a  test  battery  is  adaptively  admin¬ 
istered.  This  approach  can  reduce  test  battery  length  by  50%  or  more  with 
no  appreciable  effect  on  the  psychometric  characteristics  of  scores  on  the 
tests  in  the  battery  (Research  Reports  79-6  and  81-4).  Although  the  major 
reductions  in  test  battery  length  were  attributable  to  adaptive  intrasub¬ 
test  item  selection,  there  were  additional  small  reductions  in  test  length 
due  to  intersubtest  branching.  Intersubtest  branching  also  resulted  in 
test  battery  information  levels  that  closely  approximated  those  of  the  full 
test  battery,  in  comparison  to  information  levels  obtained  solely  from  the 
use  of  adaptive  intrasubtest  item  selection  (Research  Report  81-4).  Re¬ 
sults  also  indicated  (Research  Report  81-4)  that  the  order  in  which  sub¬ 
tests  were  selected  for  intersubtest  branching  had  no  effect  on  either  the 
efficiency  of  test  administration  or  on  the  psychometric  characteristics  of 
the  resulting  test  scores. 

2.  The  use  of  change  scores  to  measure  changes  in  achievement  over  time,  which 
assumes  that  the  factor  underlying  changes  in  performance  is  invariant,  may 
be  appropriate  in  some  achievement  testing  environments  and  not  in  others. 
Results  from  college  courses  (Research  Report  81-5)  indicated  that  the  fac¬ 
tor  structure  of  measured  achievement  in  a  biology  course  was  not  the  same 
prior  to  instruction  as  it  was  after  several  weeks  of  instruction.  In  a 
mathematics  course,  however,  the  factor  structure  of  measured  achievement 
did  not  change  over  a  10-week  period.  These  results  suggest  that  in  the 
absence  of  information  to  indicate  that  the  dimensionality  of  measured 
achievement  does  not  change  over  time,  it  is  inappropriate  to  compute  sim¬ 
ple  difference  scores  to  measure  changes  in  achievement  levels. 

3.  There  was  some  indication  (Kingsbury,  1984)  that  Bayesian  adaptive  tests 
using  an  individual  prior  achievement  level  estimate  resulted  in  more  reli¬ 
able  change  scores  than  were  obtained  from  comparable  conventional  achieve¬ 
ment  tests.  Further  research  is  needed,  however,  to  investigate  the  gener- 
aiizability  of  these  findings  in  other  achievement  domains. 

4.  Adaptive  Mastery  Testing  (AMT)  is  a  viable  procedure  for  reducing  test 
length  of  mastery  tests  and  improving  the  efficiency  of  mastery  classifica¬ 
tions.  In  monte  carlo  simulation  (Research  Report  80-4)  AMT  achieved  the 
best  combination  of  test  length  reduction  and  validity  of  mastery  classifi¬ 
cations  in  comparison  with  a  sequential  probability  ratio  classification 


-/ - 


4 


f 


8 


procedure  and  conventional  tests  scored  by  proportion  correct  and  IRT-based 
Bayesian  scoring.  The  advantages  of  AMT  were  most  pronounced  in  realistic 
item  pools  in  which  items  varied  in  difficulties  and  discriminations.  AMT 
also  tended  to  result  in  a  more  even  balance  of  false  mastery  and  false 
non-mastery  classifications  in  comparison  to  the  sequential  procedure. 

Results  from  the  simulation  study  were  supported  in  live  testing  (Research 
Report  81-3).  In  comparison  to  conventional  achievement  tests,  both  fixed- 
and  variable-length  AMTs  resulted  in  mastery  classifications  that  were  more 
consistent  with  an  independent  mastery  criterion.  The  average  variable- 
length  adaptive  test  was  able  to  make  a  high-confidence  classification  for 
students  using  only  from  2  to  5  items,  ^hus  reducing  test  lengths  as  much 
as  74%  to  88%  from  the  20-item  conventional  test,  with  no  loss  in  classifi¬ 
cation  accuracy. 

Adaptive  Ability  Testing 

3.  Owen's  Bayesian  adaptive  testing  strategy  results  in  9  estimates  that,  un¬ 
der  realistic  testing  conditions,  are  biased  and  not  of  equal  precision 
across  0  levels  (Research  Report  83-2  and  Weiss  &  McBride,  1984).  Only 
under  the  unrealistic  situation  in  which  true  0  was  used  as  the  prior  0  did 
Owen's  procedure  result  in  unbiased  6  estimates  and  reasonably  horizontal 
information  functions.  Bias  was  also  differentially  affected  by  item  dis¬ 
criminations  for  var iable-length  tests.  In  addition,  for  these  tests,  test 
length  was  an  increasing  function  of  0.  The  design  of  these  studies  al¬ 
lowed  identification  of  the  source  of  the  bias  to  be  the  use  of  a  constant 
(group)  prior  0  estimate  to  begin  the  Bayesian  adaptive  testing. 

6.  Krrors  in  item  parameter  estimates  do  not  seriously  affect  the  performance 
of  adaptive  testing  strategies  (Crichton,  1981;  Mattson,  1983).  In  the 
3-parameter  data  (Crichton,  1981)  using  indices  combined  across  0  levels, 
when  error  was  introduced  into  the  separate  item  parameter  estimates  the 
effects  were  small  for  errors  within  the  usually  observed  range.  The  a  and 
_b  parameters  generally  had  similar  effects  on  adaptive  test  performance, 
while  errors  in  the  £  parameter  had  negligible  effects.  When  errors  in  the 
three  parameters  were  combined,  effects  differed  little  from  the  case  with 
error  in  a_  or  Jb,  except  for  very  unrealistic  levels  of  error.  There  were 
no  appreciable  differences  in  susceptibility  to  error  among  the  stradap- 
tive,  maximum  information,  and  Bayesian  adaptive  testing  strategies. 

7.  When  indices  conditional  on  9  were  examined  in  the  3-parameter  data 
(Crichton,  1981),  Bayesian  and  maximum  information  adaptive  tests  were 
somewhat  less  susceptible  to  errors  in  item  parameter  estimates  than  was 
the  stradaptive  test.  Whereas  errors  in  estimation  of  the  b  and  c  parame¬ 
ters  had  little  effect  on  the  conditional  indices,  estimation  errors  in  the 
a_  parameter  resulted  in  the  major  effects  on  the  conditional  indices,  indi¬ 
cating  that  large  errors  in  estimating  £  may  deteriorate  the  performance  of 
the  adaptive  testing  strategies.  Even  with  this  deterioration  in  perfor¬ 
mance,  however,  the  adaptive  tests  still  performed  better  than  the  conven¬ 
tional  tests  for  a  substantial  portion  of  the  0  range. 

8.  When  ji  and  ^b  parameter  estimates  were  allowed  to  correlate  with  each  other, 


9 


as  they  do  in  many  real  item  pools,  there  was  no  additional  effect  on  6 
estimates  beyond  that  due  to  errors  in  uncorrelated  item  parameter  esti¬ 
mates  (Mattson,  1983). 

9.  Maximum  likelihood  0  estimation  performed  better  than  Bayesian  estimation 
for  lesser  degrees  of  error  in  the  item  parameters,  and  Bayesian  estimation 
was  less  affected  by  item  parameter  erors  for  more  extreme  levels  of  error, 
particularly  for  the  l-  and  2-parameter  models  (Mattson,  1983). 

10.  The  2-parameter  model  was  least  affected  by  errors  in  item  parameter  esti¬ 
mates  (Mattson,  1983).  Under  conditions  of  large  errors  in  item  parameter 
estimates,  the  2-parameter  modeL  performed  better  than  the  error-free  cases 
of  1-  and  3-parameter  models. 

11.  Mu  1 1 id imens iona 1 i ty  has  a  more  serious  effect  on  0  estimates  from  maximum 
information  adaptive  tests  than  does  errors  in  item  parameter  estimates 
(SuhaLdonik  &  Weiss,  1983).  For  multidimensional  structures  with  one  or 
two  factors  beyond  the  first  that  account  for  up  to  one-fourth  the  variance 
of  the  first  factor,  overcoming  the  effects  of  multidimensionality  would 
require  doubling  of  adaptive  test  length.  The  data  also  suggested  that  the 
number  of  factors,  and  not  simply  the  overall  strength  of  the  factor  struc¬ 
ture,  affects  0  estimates,  since  a  single  factor  beyond  the  first  had  less 
effect  than  did  two  factors  that  accounted  for  the  same  amount  of  variance. 
In  general,  however,  adaptive  testing  is  quite  robust  to  multidimensional 
structures  of  the  type  most  frequently  resulting  from  careful  item  selec¬ 
tion — i.e.,  factor  structures  with  a  strong  first  factor  and  second  or 
third  factors  that  account  for  less  than  one-eighth  of  the  variance  of  the 
first  factor. 

12.  Administration  of  multiple-choice  items  in  a  probabilistic  response  mode 
may  be  a  useful  application  of  computerized  test  administration.  Although 
items  answered  in  a  probabilistic  mode  did  not  result  in  higher  validities 
than  multiple-choice  items  responded  to  dichotomously ,  the  probabilistic 
mode  resulted  in  higher  reliabilities  and  a  stronger  first  factor  (Research 
Report  83-3).  Since  the  stronger  first  factor  would  result  in  higher  IRT 
item  discrimination  parameters  for  these  items,  adaptive  testing  based  on 
items  administered  probabilistically  would  likely  be  more  efficient,  re 
suiting  in  shorter  tests  or  in  more  precise  9  estimates.  Additional  analy¬ 
ses  of  item  formats  and  response  modes  (Thompson,  1983)  showed  that  items 
presented  in  a  dichotomous  format  yielded  different  factor  structures  than 
did  multiple-choice  formats,  but  supported  the  higher  reliabilities  ob¬ 
served  in  Research  Report  83-3  for  the  probabilistic  response  format. 

13.  Computerized  test  administration  variables — including  adaptive  vs.  conven¬ 

tional  test  type,  computer-  vs.  self-paced  item  administration,  and  immedi¬ 
ate  knowledge  of  results  after  each  item  is  administered — do  not  have  di¬ 
rect  effects  on  ability  test  performance,  as  measured  by  estimated  0  levels 
(Research  Report  81-2).  These  test  administration  variables  do,  however, 
have  effects  on  psychological  reactions  to  testing.  Immediate  knowledge  of 

results  appears  to  have  a  standardizing  effect  on  test  anxiety  and  test¬ 
taking  motivation,  since  mean  levels  of  anxiety  and  motivation  were  differ¬ 
ent  when  knowledge  of  results  was  provided  but  similar  when  it  was  not. 


^5  a*  start 


jet' star-. 


O- '  _  .  -  CC  if 


.rC3rP"  *,  ijfo,  Cir3:tof 
!  Ccs:;tivE  ?rj:63ses 
l*.it:i"al  Sc;er;e  F J ^ r d a ’i : cn 
**5*. DC  :;!!•; 


I".  :r:.r;  ?.  Parser. 
Decrtre-:  :♦  Statistics 
5t_!de3:raeje  6 
:*Z:  : :ce^acr. 

:E'NrAR‘‘ 


l".  iit’Sw  Es;  «r 


>  ’’fe^jcha  B; rerbaw,i 
Sdooi  c4'  £d -eat  ion 
**!  4vir  jPi^srSity 

flv'.y,  ®a!1at  4v ;  /  3g?"S 

i  sr ; 

I".  Aler'~r  ?;of 

^rscvl  sianfiMt  cer  flundeswehr 
:-SV'->  tfceln  °C 
WE  57  jtRMAPlv 


Cr.  Darrel:  Bcc- 
Ceoanre.it  of  Education 
j r . .  =  r s : t >  ct  Chicago 
Chicago,  U.  c.-ijC*’ 

J**'.  n'r:lt  h::~-'er 

of  -S.lhd  OGiCa’  ^BSee'Cti 

..•5"rr."iE  "Et-ta  Iv  at Ecii 


!>.  dbed  B-er.-io* 

^■T-rr.can  College  Testing  Programs 
£;•  .  lod 


i  Ir.  Ernest  F.  Cadet tE 
TO7  5 1 C •:  9 J  V 

l' ill  vers;  i «•  of  Tennessee 
fT^ille,  TN  !??16 

'  Z,r.  James  Li'lsjr 
Aaen:ar  College  Testing  Fr co-sa 
?'.C:.  Bdv  ic3 
; [it,.  I.g  32243 

1  Dr.  John  r.  Car'd  I 

40?  EiJiatt  ?d. 

Chape!  H:  1 1 ,  NC  273l4 

1  O',  Nonari  Cliff 
Dect.  if  Psydolcgv 
.riv.  o;  California 
Jr i»er 5; tv  Far! 

Lia  Angeles,  C*  7 0 C C- “ 

1  Dr,  Hans  Crcubag 
EdLcaticr  ReeP3r :*■•  C emer 
Um*erS!ty  rJ  l ayden 
Biernaevelaa'  2 
2334  EN  _er-Br 
y-a  NETHERLANDS 

I  Lee  C'arbad 
16  Laburnufi  Read 
Athe-tcd  Eft  *4205 

!  CTS'lcSraw-Hii!  Libra*-* 

2500  Garden  Read 
^ont ere;,  rg  93550 

1  V.  Tiftcthy  Facet 
university  Illinois 
Deparfcier.t  of  Educational  Psychology 
Urbara,  II  6180 1 


1  Cr.  Dattpradcd  D:vg; 

Syracuse  Ur :  < s ~r 5 : t 
Depart sent  of  Fs/trd  ;-qy 
Cv'aiuse,  i£  ::::o 

1  E-\  EaAanusl  tucdn 
Department  0*  Psycho: oe. 

University  o*  Illinois 
[ha-rpaicn.  !2  d22C 

1  Dr.  hei-d  C-mg 
Ball  Fcjncation 
8CC  Roosevelt  Road 
Building  [,  S«;te  Ijj 
21  en  El  1  r'T,  II  a01 77 

1  -*r  .  r  **  1 1 0  D*'3  rC*j« 

Tepart-iert  of  T 5 ton dig» 

.ers:ty  c;  Illinois 
603  E  Daniel  St. 

C^aiFipa  *  gr-ji  i  _  618^ 

j  Dr.  St e: ten  Duntar 

L  .  nC’Oiist  C5!;[tr  MS3rt 

.  ye- 51  c v  of  1 : (*•  a 
!c«a  C: tv.  S  52242 

1  Dr.  John  H.  Eddms 
U” a vsrsi tv  of  Illinois 
252  Ercineering  Research  Laboratory 
103  South  Mathews  Street 
Urfcana,  n  61301 

1  Dr,  Susan  Eaoertson 
PSfCHOLDGY  DEFARTflEMT 
UNIVERSITY  OF  KANSAS 
Lawrence.  K3  66045 

1  ERIC  Faci lity-Acciii s;  tjors 
4833  Rugby  Avenue 
Bethesoa,  MF  20034 

I  Ur.  Ee/ijaiii,-  A,  Fairbank,  Jr. 
perfor*ance  Metrics,  Inc. 

5825  Callaghan 
Suite  225 

San  Antonio,  TX  78228 

I  Dr.  Leo-ard  Felct 
Lindquist  Center  for  Heasureent 
University  of  Iowa 
Iowa  City.  IA  52242 

I  Dr.  Richard  L,  Ferguson 
The  Aaerican  College  Testing  Program 
p.0.  Box  168 
Iowa  City,  IA  52240 


1 


/ 


m  ...  r  * i- « - 
:  .  e  -  ul  p  = 

1  r : ; .  *e:di-:h 
'  *ad:. ■  5.  Cmp; 

*-p  ; 

CC  I-.3E0 

:  3.  rar.ns  Cures 

f‘ '  Z  PC'--' 


:;d?  :  ;CH 

Qt  Nav31  S55SS' '  ilh 

3-m  *.  3c. 


:  -r,-  c,anr  fohav.an.  U3MC 
-tiLC-rte':,  Marr.e  Ceres 
•Cade 

was**  melon,  CC  20 330 

^rr 

!  m.  '  ?nt  E:-tcr 
firr-,  FeSe^'Ch  Ir  5t  i  t  lit  e 
5:' E.  sm  'j#»r  5.  ,-d. 

'  .  -  -  j  .  ,  u/in 

*•  -  cr  C  i ;  *  1  1  c  «  Vrt  ^  „  -j  „ 

-.3.  At.  Research  Irstit-ta  *cr  the 
Zoc.il  and  Beha/ioraI  Sci areas 
3001  Ei5£iiiD*er  Avsnue 
fil e\ ar dr : a ,  Vfi  21333 

1  2?-.  Cl  esse*  «a*:ip 

v  5e search  Instigate 
!v;:  E.serS?*er  BIvo. 

Ai  0 jc  e  *  t? r  i  i  i  VA  22333 

1  Dr.  kmei  Mitchell 
Ar*y  Research  Institute 
!">?!  £isephc*tr  91yd 
Alexandria,  VA  22333 

!  Dr.  wi.lia*  E.  Nordbrock 
fyC*wDCC  Box  25 
APO,  NY  09210 

I  Dr.  Harold  F.  O’Meii,  Jr. 

Director,  Training  Research  Lad 
Ar«y  Research  Institute 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333 


I  [caeander,  U.S.  Ar»y  Research  Institute 
for  the  Behavioral  t  Social  Sciences 
ATT«:  dER1-0R  (Dr.  Judith  Drasaru) 

SCO  1  Eisenhower  Ave^e 
AIe,< and'; a,  VA  22333 

1  Mr.  Robert  Ross 

U.S.  Arav  Research  Institute  for  me 
Social  and  Behavioral  Scie.  es 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333 

1  Dr.  Robert  Sasiaor 
U.  5.  Ar  fry  Research  Institute  +ar  the 
Behavioral  and  Social  Sciences 
5001  Eisenhcwer  Avenue 
Alexandria.  VA  22333 

1  Dr.  Joyce  Shields 
Anny  Rtseer:h  Institute  for  the 
Bena^oral  and  Social  Sciences 
5031  Eiser.  newer  Avenue 
Alexandria,  VA  223-33 

1  tr.  Hilda  r#  1  r.  g 
firay  Research  Institute 
5001  Eisenhower  five. 

Alexandria,  VA  22333 

Am  horre 

1  Dr.  Earl  A.  Aliuisi 
HQ,  AFHRL  (AFSC) 

Brooks  A:B,  TX  7B235 

1  Cel.  Reger  Campbell 
AF/WPXCA 

Pentagon,  Rooi  4E195 
Washington,  DC  20330 

1  Hr.  Rayiond  E.  Christa! 

AFHRL / HOE 

Brooks  AFB,  TX  7B233 

1  Dr.  Alfred  R,  Freely 
AFDSR/NL 

Bolling  AFB,  DC  20332 

l  Dr.  Patrick  kylloneo 
AFHRL /HOE 

Brooks  AFB,  TI  78235 

1  Dr.  Randolph  Park 
AFHRL /MOAN 

Brooks  A-l,  TX  78233 


1  Dr.  Roger  Pennell 
Air  Force  Hunan  c:e=0u> ces  Labor  at cr  v 
lo*rv  AFB,  CO  80230 

1  Dr.  halcoln  nee 
AFHRL; HP 

Brooks  AFB,  TX  "8235 

:  Waj .  Bill  Strickland 

AF/HPXOA 
4E163  Pentagon 
Washington,  DC  20330 

1  Dr.  John  T an gney 
AF05R/NL 

Boiling  AFB.  DC  20332 

1  Hajcr  John  Welsh 
AFHRL /HQAK 

Brooks  AFB  .  ta  7S2I! 

I  Dr .  yCseph  i&SaiutE 
AFHRL 'LRT 

L 0Vv r  v  AFB,  CO  BC23: 

Department  cf  De;e~s& 

12  Defense  "e;hr:cai  Information  Center 
Caaeron  Station,  Blog  5 
Alexandria,  VA  22314 
Attn:  T[ 

1  Dr.  Anita  Lancaster 
Accession  Policy 
OASP/PHL/RPliFB/AP 
Pentagon,  Fcoir  2B2"1 
Washington,  DC  20301 

1  Dr.  Jerry  Lehnjs 
0A5D  X MlfRA) 

Washington  ,  DC  20301 

1  Dr.  Clarence  McCoraick 
HQ,  HEPCOH 
HEPCT-P 

2500  Breen  Bay  Road 
Noprth  Chicago,  II  60064 

1  Military  Assistant  for  Training  and 
Personnel  Technology 
Office  of  the  Under  Secretary  of  Defer 
for  Research  !>  Engineering 
Room  3D129,  The  Pentagon 
Washington,  DC  20301 


Distribution  List 


1  *=,  Peoeuza  retzsr 
NdV»  r 9^* 5 3<" r  &  1  Ri-D  Carter  :_C£S  &*. 


■  A  - 


M .  r*?su 

\±m T=iE7. IFIEN 
:Je  N -?'*!■? 

72517 


;  Dr ,  3 1  a r .  e .  [•:•! !  ver 
CM: re  of  Naval  Technology 
3CC  N.  Quine,  Street 
Arlirgho r„  VA  22211 

1  [DP  Mi  (re  Curran 
Office  of  Nava!  Research 
30 C  N.  Q-;ncy  St. 

Code  270 

Arlington,  VA  22217 

1  Dr,  Charles  E.  Davis 
Personnel  and  Mailing  Research 
Cffice  of  Naval  Research  (Code  442PT) 
300  North  Qjincv  Street 
Arlington,  VA  22217 

1  2r.  John  Ellis 
Navy  Personnel  RiD  Center 
3a*  Diego,  CA  92252 

1  DR.  pAT  FEDERICC 
Code  PI 3 
NPRDC 

San  Diego,  CA  92152 

1  Jtr.  Paul  Foley 
Navy  Personnel  RiD  Center 
San  Diego,  CA  92 151 


1  2»*.  Lsc~ard  kroner 
fcvv  Personnel  RiD  Center 
San  D:egc,  I A  52172 

1  Dr.  Dar vll  Lang 
Nd‘  r  Personnel  RtB  Center 


1  Cr.  William  L.  Mai  a r  :,02) 

Chief  cf  Naval  Education  and  Training 
Naval  AM  Station 
Pensacola,  FL  32508 

i  Dr.  iases  JIcBnde 
Navy  Personnel  P.3.D  Csnt2r 
San  Siege,  CA  92152 

1  Dr  William  tontague 
NPRDC  Code  13 
Bar  Siege,  CA  92152 

1  Ns.  hat  risen,  torero 
Navy  Personnel  R&D  Centef  -lode  62 
San  Diego,  CA  ^21 52 

1  Library,  Code  P201L 
Navy  Personnel  R«D  Center 
San  Diego,  CA  92152 

1  Technical  Director 
Navy  Personnel  RVD  Center 
San  Diego,  CA  92152 

5  Personnel  i  Training  Research  Group 
Code  442PT 

Office  of  Naval  Research 
Arlington,  VA  22217 

1  Dr.  Carl  Ross 
MET-POCD 
Building  90 

Great  Lakes  NIC,  IL  600GB 


1  Nr.  Drew  Sands 
nPRDC  Code  t>2 
Sar,  Diego,  CA  92152 

1  Dr.  tor;  Schra*.: 

Ndi’y  per sonne.  R’cT  -ertt'* 
San  Diego,  CA  92152 

1  Dr.  Alfred  F.  Siscc e 
Senizr  Scientist 
Code  7B 

Nava:  Training  Equipment  C-s 
Orlando.  rL  32817 

1  Dr.  F:icharo  bnz« 

L;a:son  Scientist 
Office  o*  Naval  Reseat* 
Eranori  CM  ice,  Lender 
PO/-  3p 

Fcu  Ne*  Vor<,  N  * 

•  Dr.  Fiohard  Sorensen 
Navy  Personnel  RiD  Center 
ban  Diego,  CA  c2152 

1  fir,  Brad  Syepson 
Nav\  Personnel  RK  Center 
Sar.  Diego,  CA  92152 

1  Dr.  Frank  Vidro 
Navy  Personnel  R&D  Center 
San  Diego,  CA  °2!52 

1  Dr,  Ronald  Wei trman 
Naval  Postgraduate  Schorl 
Department  of  Adair, istrati ■ 
Sciences 

"enter ey,  CA  93940 

1  Dr.  Douglas  Wetzel 
Code  12 

Navy  Personnel  R!tS  Center 
Sar-  Diego.  CA  92:52 

1  DR.  N  APT  IN  F.  WISKC3FF 
NAVY  PERSONNEL  Pi  D  CENTER 
SAN  DIEGO,  CA  92152 

!  Nr  John  H.  Wolfe 
Navy  Personnel  RiB  Center 
Ban  Diego,  CA  152 

1  Dr.  Wallace  Nulfeck,  ill 
Navy  Personnel  R&D  Center 
San  Diego,  CA  92152 


20  - 


Psychometric  Methods  Program*  Computerized  Adaptive  Testing  Laboratory • 

Mattson,  J.  D.  (1983,  June).  Effects  of  item  parameter  error  and  other  factors 
on  trait  estimation  in  latent-trait-based  adaptive  testing.  Unpublished 
doctoral  dissertation,  University  of  Minnesota. 

Maurelli,  V.  A.,  &  Weiss,  D.  J.  (1981,  November).  Factors  influencing  the  psy¬ 
chometric  characteristics  of  an  adaptive  testing  strategy  for  test  batter¬ 
ies  (Research  Report  81-4).  Minneapolis:  University  of  Minnesota,  Depart¬ 
ment  of  Psychology,  Psychometric  Methods  Program,  Computerized  Adaptive 
Testing  Laboratory. 

Suhadolnik,  D. ,  &  Weiss,  D.  J.  (1983,  July).  Effect  of  examinee  certainty  on 

probabilistic  test  scores  and  a  comparison  of  scoring  methods  for  probabil¬ 
istic  responses  (Research  Report  83-3).  Minneapolis:  University  of  Minne¬ 
sota,  Department  of  Psychology,  Computerized  Adaptive  Testing  Laboratory. 

Suhadolnik,  D.  ,  &  Weiss,  D.  J.  (  1985).  Robustness  of  adaptive  testing  to  multi¬ 
dimensionality.  In  D.  J.  Weiss  (Ed.),  Proceedings  of  the  1982  Item  Re¬ 
sponse  Theory  and  Computerized  Adaptive  Testing  Conference  (pp.  248-280). 
Minneapolis:  University  of  Minnesota,  Department  of  Psychology,  Psychomet¬ 

ric  Methods  Program,  Computerized  Adaptive  Testing  Laboratory. 

Thompson,  J.  G.  (1983,  August).  An  investigation  of  the  dimensionality  of  mul¬ 
tiple-choice  and  dichotomous  vocabulary  test  items  administered  in  proba¬ 
bilistic  and  categorical  response  formats.  Unpublished  Master’s  thesis, 
University  of  Minnesota. 

Weiss,  D.  J.  (1982).  Improving  measurement  quality  and  efficiency  with  adaptive 
testing.  Applied  Psychological  Measurement,  6^,  473-492. 

Weiss,  D.  J.,  &  Kingsbury,  G.  G.  (1984).  Application  of  computerized  adaptive 
testing  to  educational  problems.  Journal  of  Educational  Measurement,  21, 
361-375. 

Weiss,  D.  J.,  &  Davison,  M.  L.  (1981,  January).  Review  of  test  theory  and  meth¬ 
ods  (Research  Report  81-1).  Minneapolis:  University  of  Minnesota,  Depart¬ 
ment  of  Psychology,  Psychometric  Methods  Program,  Computerized  Adaptive 
Testing  Laboratory. 

Weiss,  D.  J.,  &  McBride,  J.  R.  (1983,  March).  Bias  and  information  of  Bayesian 
adaptive  testing  (Research  Report  83-2).  Minneapolis:  University  of  Min¬ 
nesota,  Department  of  Psychology,  Psychometric  Methods  Program,  Computer¬ 
ized  Adaptive  Testing  Laboratory. 

Weiss,  D.  J. ,  &  McBride,  J.  R.  (1984).  Bias  and  information  of  Bayesian  adap¬ 
tive  testing.  Applied  Psychological  Measurement,  8^,  273-283. 


-r 

/ 


i 


f 


19 


REFERENCES 


Crichton,  L.  J.  (1981,  June).  Effect  of  error  in  item  parmeter  estimates  on 

adaptive  testing.  Unpublished  doctoral  disser ta tion ,  University  of  Minne¬ 
sota. 

Gialluca,  K.  A. ,  &  Weiss,  D.  J.  (1979,  November).  Efficiency  of  an  adaptive 

inter-subtest  branching  strategy  in  the  measurement  of  classroom  achieve¬ 
ment  (Research  Report  79-6).  Minneapolis:  University  of  Minnesota,  De¬ 
partment  of  Psychology,  Psychometric  Methods  Program,  Computerized  Adaptive 
Testing  Laboratory. 

Gialluca,  K.  A. ,  &  Weiss,  D.  J.  (1981,  December).  Dimensionality  of  measured 

achievement  over  time  (Research  Report  81-5).  Minneapolis:  University  of 
Minnesota,  Department  of  Psychology,  Psychometric  Methods  Program,  Comput¬ 
erized  Adaptive  Testing  Laboratory. 

Johnson,  M.  F.  ,  Weiss,  D.  J.,  &  Prestwood,  J.  S.  (1981,  February).  Effects  of 
immediate  feedback  and  pacing  of  item  presentation  on  ability  test  perfor¬ 
mance  and  psychological  reactions  to  testing  (Research  Report  81-2).  Min¬ 
neapolis:  University  of  Minnesota,  Department  of  Psychology,  Psychometric 

Methods  Program,  Computerized  Adaptive  Testing  Laboratory* 

Kingsbury,  G.  G.  (1984,  August).  Adaptive  self-referenced  testing  as  a  proce¬ 
dure  for  the  measurement  of  individual  change  due  to  instruction:  A  compar¬ 
ison  of  the  reliabilities  of  change  estimates  obtained  from  adaptive  and 
conventional  testing  procedures.  Unpublished  doctoral  dissertation,  Uni¬ 
versity  of  Minnesota. 

Kingsbury,  G.  G. ,  &  Weiss,  D.  J.  (1980a,  September).  A  comparison  of  ICC-based 
adaptive  mastery  testing  and  the  Waldian  probability  ratio  method.  In  D. 

J.  Weiss  (Ed.),  Proceedings  of  the  1979  Computerized  Adaptive  Testing  Con¬ 
ference  (pp.  120-139).  Minneapolis:  University  of  Minnesota,  Department 
of  Psychology,  Psychometric  Methods  Program,  Computerized  Adaptive  Testing 
Laboratory. 

Kingsbury,  G.  G. ,  &  Weiss,  D.  J.  (1980b,  November).  A  comparison  of  adaptive, 
sequential,  and  conventional  testing  strategies  for  mastery  decisions  (Re¬ 
search  Report  80-4).  Minneapolis:  University  of  Minnesota,  Department  of 
Psychology,  Psychometric  Methods  Program,  Computerized  Adaptive  Testing 
Laboratory. 

Kingsbury,  G.  G. ,  &  Weiss,  D.  J.  (1980c,  December).  An  alternate-forms  relia¬ 
bility  and  concurrent  validity  comparison  of  Bayesian  adaptive  and  conven¬ 
tional  ability  tests  (Research  Report  80-5).  Minneapolis:  University  of 
Minnesota,  Department  of  Psychology,  Psychometric  Methods  Program,  Comput¬ 
erized  Adaptive  Testing  Laboratory. 

Kingsbury,  G.  G. ,  &  Weiss,  D.  J.  (1981,  September).  A  validity  comparison  of 
adaptive  and  conventional  strategies  for  mastery  testing  (Research  Report 
81-3).  Minneapolis:  University  of  Minnesota,  Department  of  Psychology, 


18 


to  each  alternative  as  the  item  score.  Total  test  scores  for  all  of  the  scoring 
methods  were  obtained  by  summing  individual  item  scores. 

Several  studies  using  probabilistic  response  methods  have  shown  the  effect  of  a 
response-style  variable,  called  certainty  or  risk  taking,  on  scores  obtained 
from  probabilistic  responses.  Results  from  this  study  showed  a  small  effect  of 
certainty  on  the  probabilistic  scores  in  terras  of  the  validity  of  the  scores  but 
no  effect  at  all  on  the  factor  structure  or  internal  consistency  of  the  scores. 
Once  the  effect  of  certainty  on  the  probabilistic  scores  had  been  ruled  out,  the 
five  scoring  formulas  were  compared  in  terms  of  validity,  reliability,  and  fac¬ 
tor  structure.  There  were  no  differences  in  the  validity  of  the  scores  from  the 
different  methods,  but  scores  obtained  from  the  two  scoring  formulas  that  were 
not  reproducing  scoring  systems  were  more  reliable  and  had  stronger  first  fac¬ 
tors  then  the  scores  obtained  using  the  reproducing  scoring  systems.  For  prac¬ 
tical  use,  however,  the  reproducing  scoring  systems  may  have  an  advantage  be¬ 
cause  they  maximize  examinees1  scores  when  examinees  respond  honestly,  while 
honest  responses  will  not  necessarily  maximize  an  examinee’s  score  with  the  oth¬ 
er  two  methods.  If  a  reproducing  scoring  system  is  used  for  this  reason,  the 
spherical  scoring  formula  is  recommended,  since  it  was  the  most  internally  con¬ 
sistent  and  showed  the  strongest  first  factor  of  the  reproducing  scoring  sys¬ 
tems  • 


17 


achievement  levels  increased  the  underlying  factor  structure  remained  unchanged. 
The  implications  of  these  results  for  psychology,  education,  and  program 
evaluation  are  noted.  (AD  A1 10955) 


Research  Report  83-2 

Bias  and  Information  of  Bayesian  Adaptive  Testing 
David  J.  Weiss  and  James  R.  McBride 
March  1983 

Monte  carLo  simulation  was  used  to  investigate  score  bias  and  information  char¬ 
acteristics  of  Owen’s  Bayesian  adaptive  testing  strategy,  and  to  examine  possi¬ 
ble  causes  of  score  bias.  Factors  investigated  in  three  related  studies  includ¬ 
ed  effects  of  an  accurate  prior  0  estimate,  effects  of  item  discrimination,  and 
effects  of  fixed  vs.  variable  test  length.  Data  were  generated  from  a  three- 
parameter  logistic  model  for  3,100  simulees  in  each  of  eight  data  sets;  Bayesian 
adaptive  tests  were  administered,  drawing  items  from  a  ’’perfect"  item  pool. 
Results  showed  that  the  Bayesian  adaptive  test  yielded  unbiased  0  estimates  and 
relatively  flat  information  functions  only  in  the  unrealistic  situation  in  which 
an  accurate  prior  0  estimate  was  used.  When  a  more  realistic  constant  prior  Q 
estimate  was  used  with  a  fixed  test  length,  severe  bias  was  observed  that  varied 
with  item  discrimination.  A  different  pattern  of  bias  was  observed  with  varia¬ 
ble  test  length  and  a  constant  prior.  Information  curves  for  the  constant  prior 
conditions  generally  became  more  peaked  and  asymmetric  with  increasing  item  dis¬ 
crimination.  In  the  variable  test  length  condition  the  test  length  required  to 
achieve  a  specified  level  of  the  posterior  variance  of  0  estimates  was  an 
increasing  function  of  0  level.  These  results  indicate  that  Q  estimates  from 
Owen’s  Bayesian  adaptive  testing  method  are  affected  by  the  prior  0  estimate 
used  and  that  the  method  does  not  provide  measurements  that  are  unbiased  and 
equi precise  except  under  the  unrealistic  condition  of  an  accurate  prior  0  esti¬ 
mate.  (AD  A129280) 


Research  Report  83-3 

Effect  of  Examinee  Certainty  on  Probabilistic  Test  Scores 
and  a  Comparison  of  Scoring  Methods  for  Probabilistic  Responses 
Debra  Suhadolnik  and  David  J.  Weiss 
July  1983 

The  present  study  was  an  attempt  to  alleviate  some  of  the  difficulties  inherent 
in  multiple-choice  items  by  having  examinees  respond  to  multiple-choice  items  in 
a  probabilistic  manner.  Using  this  format,  examinees  are  able  to  respond  to 
each  alternative  and  to  provide  indications  of  any  partial  knowledge  they  may 
possess  concerning  the  item.  The  items  used  in  this  study  were  30  multiple- 
choice  analogy  items.  Examinees  were  asked  to  distribute  100  points  among  the 
four  alternatives  for  each  item  according  to  how  confident  they  were  that  each 
alternative  was  the  correct  answer.  Each  item  was  scored  using  five  different 
scoring  formulas.  Three  of  these  scoring  formulas — the  spherical,  quadratic, 
and  truncated  log  scoring  methods — were  reproducing  scoring  systems.  The  fourth 
scoring  method  used  the  probability  assigned  to  the  correct  alternative  as  the 
Item  score,  and  the  fifth  used  a  function  of  the  absolute  difference  between  the 
correct  response  vector  for  the  four  alternatives  and  the  actual  points  assigned 


I 


Hi  laStiniiMrfii 


16 


Research  Report  81-4 

Factors  Influencing  the  Psychometric  Characteristics  of  an 
Adaptive  Testing  Strategy  for  Test  Batteries 
Vincent  A.  Maurelli  and  David  J.  Weiss 
November  1981 

A  monte  carlo  simulation  was  conducted  to  assess  the  effects  in  an  adaptive 
testing  strategy  for  test  batteries  of  varying  subtest  order,  subtest  termina¬ 
tion  criterion,  and  variable  versus  fixed  entry  on  the  psychometric  properties 
of  an  existent  achievement  test  battery*  Comparisons  were  made  among  conven¬ 
tionally  administered  tests  and  adaptive  tests  using  adaptive  intra-subtest  item 
selection  with  and  without  inter-subtest  branching.  Data  consisted  of  responses 
of  300  simulees  to  a  201-item  achievement  test  battery.  Mean  test  battery 
length  was  reduced  from  42.5%  to  52.3%  using  adaptive  intra-subtest  item  selec¬ 
tion  with  variable  termination.  Reductions  in  mean  subtest  lengths  ranged  from 
27%  to  67%.  When  inter-subtest  branching  was  added,  additional  test  length  re¬ 
ductions  of  1%  to  2%  were  observed  for  individual  subtests.  The  reductions  in 
test  length  were  achieved  with  no  significant  loss  of  fidelity  or  psychometric 
information.  The  addition  of  inter-subtest  branching  resulted  in  levels  of  mean 
test  battery  information  more  similar  to  those  of  the  full  test  battery,  even 
with  mean  test  battery  reductions  of  50%  in  number  of  items  administered.  Sub¬ 
test  order  was  shown  to  have  no  effect  on  the  evaluative  criteria  employed.  The 
results  generally  supported  previous  studies  of  this  adaptive  testing  strategy. 
Suggestions  for  future  research  are  presented.  (AD  A109666) 


Research  Report  81-5 

Dimensionality  of  Measured  Achievement  Over  Time 
Kathleen  A#  Gialluca  and  David  J.  Weiss 
December  1981 

Some  type  of  difference  or  change  score  is  frequently  used  to  quantify  the 
effects  of  experimental  treatments  and  educational  programs  on  individuals  and 
on  groups  of  individuals.  Whether  the  change  measurement  involves  the  use  of 
simple  difference  scores,  their  derivatives,  or  some  more  complex  methodological 
design,  the  measurement  process  itself  assumes  that  the  treatment  or  instruction 
results  in  higher  levels  of  the  originally  measured  variable  and  that  the  only 
change  that  occurs  is  a  quantitative  one.  If  this  assumption  is  not  met,  then 
the  computation  of  any  type  of  difference  score  is  inappropriate  and  the  scores 
themselves  are  useless  for  measuring  growth  or  change. 

Two  studies  investigated  the  tenability  of  the  assumption  that  classroom  in¬ 
struction  results  in  increases  in  students’  achievement  levels  while  the  quali¬ 
tative  nature  of  that  achievement  remains  constant  across  time.  The  data  util¬ 
ized  were  the  item  responses  to  tests  in  basic  mathematics  and  in  general  biolo¬ 
gy  administered  as  pretests  and  after  instruction  to  students  enrolled  in  those 
courses. 

Results  Indicated  that  this  assumption  was  not  tenable  in  the  biology  data  set, 
where  Increases  in  mean  achievement  level  were  accompanied  by  corresponding 
changes  in  the  factor  structure  underlying  the  item  responses.  For  the  mathe¬ 
matics  data,  however,  there  was  no  such  violation  of  the  assumption:  As  student 


15 


These  results  indicate  that  testing  conditions  may  interact  in  a  complex  way  to 
determine  psychological  reactions  to  the  testing  environment*  The  interactions 
do  suggest,  however,  a  somewhat  consistent  standardizing  effect  of  KR  on  test 
anxiety  and  test-taking  motivation*  This  standardizing  effect  of  KR  showed  that 
approximately  equal  levels  of  motivation  and  anxiety  were  reported  under  the 
various  testing  conditions  when  KR  was  provided,  but  that  mean  levels  of  these 
variables  were  substantially  different  when  KR  was  not  provided*  Consistent 
with  theoretical  expectations,  the  conventional  test  was  perceived  as  being 
either  too  easy  or  too  difficult,  whereas  the  adaptive  tests  were  perceived  more 
often  as  being  of  appropriate  difficulty. 

The  results  concerning  the  effects  of  KR  on  test  performance,  motivation,  and 
anxiety  found  in  this  study  were  contrary  to  earlier  reported  findings;  and  dif¬ 
ferences  in  the  studies  are  delineated.  Recommendations  are  made  concerning  the 
control  of  specific  testing  conditions,  such  as  difficulty  of  the  test  and  abil¬ 
ity  level  of  the  examinee  population,  as  well  as  suggestions  for  the  further 
analysis  of  the  standardizing  effect  of  KR.  (AD  A097688) 


Research  Report  81-3 

A  Validity  Comparison  of  Adaptive  and  Conventional 
Strategies  for  Mastery  Testing 
G*  Gage  Kingsbury  and  David  J*  Weiss 
September  1981 

Conventional  mastery  tests  designed  to  make  optimal  mastery  classifications  were 
compared  with  fixed-length  and  variable-length  adaptive  mastery  tests  in  terms 
of  validity  of  decisions  with  respect  to  an  external  criterion  measure*  Compar¬ 
isons  between  the  testing  procedures  were  made  across  five  content  areas  in  an 
introductory  biology  course  from  tests  administered  to  over  400  volunteer  stu¬ 
dents*  The  criterion  measure  used  was  the  student's  final  standing  in  the 
course,  based  on  course  examinations  and  laboratory  grades*  Results  indicated 
that  the  adaptive  test  resulted  in  mastery  classifications  that  were  more  con¬ 
sistent  with  final  class  standing  than  those  obtained  from  the  conventional 
test.  This  result  was  observed  within  individual  content  areas  and  for  discrim¬ 
inant  analysis  classifications  made  across  content  areas*  This  result  was  also 
observed  for  two  scoring  procedures  used  with  the  conventional  test  (proportion- 
correct  and  Bayesian  scoring)*  Results  also  indicated  that  there  was  no  decre¬ 
ment  in  the  performance  of  the  adaptive  test  when  a  variable  termination  rule 
was  implemented.  This  variable  termination  rule  resulted  in  test  lengths  which 
were,  on  the  average,  74%  to  88%  shorter  than  the  original  adaptive  tests*  Fur¬ 
ther  analyses  explicated  the  manner  in  which  the  adaptive  tests  administered 
differed  from  the  conventional  test  for  each  content  area  as  a  function  of 
achievement  level.  This  evidence  was  used  to  explain  why  the  adaptive  tests 
resulted  in  more  valid  decisions  than  the  conventional  procedure,  in  spite  of 
the  fact  that  the  type  of  conventional  test  used  here  was  the  most  informative 
test  concerning  the  mastery  cutoff.  It  is  concluded  that  variable-length  adap¬ 
tive  mastery  tests  can  provide  more  valid  mastery  classifications  than  ’’optimal” 
conventional  mastery  tests  while  reducing  test  length  an  average  of  80%  from  the 
length  of  the  conventional  tests*  (AD  A106867) 


plied  to  problems  of  item  option  weighting  and  adaptive  testing.  Important  de¬ 
velopments  with  these  models  during  the  period  included  the  demonstration  of 
their  relationship  with  other  psychological  measurement  models,  and  methods  for 
determining  fit  of  individuals  to  IRT  models.  As  another  alternative  to  classi¬ 
cal  test  theory,  order  models  were  developed  and  studied,  and  several  other  mod¬ 
els  were  proposed. 

Validity  issues  were  also  studied  during  this  period.  A  number  of  approaches  to 
the  analysis  of  mul t i t rai t-mu 1 t ime t hod  matrices  were  proposed  and  compared,  in¬ 
cluding  some  based  on  structural  equations  models.  Issues  of  predictive  validi¬ 
ty  studied  included  necessary  sample  sizes,  vaLidity  generalization,  and  modera¬ 
tor  and  suppressor  effects.  Test  fairness  issues  and  their  effects  on  validity 
received  considerable  attention.  Concern  was  with  (1)  bias  in  selection;  (2) 
fairness  to  minorities,  including  differential  and  single-groups  validity  and 
comparisons  of  regression  lines,  adverse  impact,  and  bias  in  test  content;  and 
(3)  fairness  to  women. 

It  is  concluded  that  little  of  consequence  was  accomplished  in  classical  test 
theory  during  this  period.  n  e  most  important  developments  were  in  alternatives 
to  classical  test  theory,  primarily  item  response  theory.  Research  in  this  area 
resulted  in  data  and  other  developments  that  will  permit  a  better  understanding 
of  the  range  of  applicability  of  these  models  and  their  potential  for  solving 
measurement  problems  not  solvable  by  classical  models.  (AD  A096157) 


Research  Report  81-2 

Effects  of  Immediate  Feedback  and  Pacing  of  Item  Presentation 
on  Ability  Test  Performance  and  Psychological  Reactions  to  Testing 
Marilyn  F.  Johnson,  David  J.  Weiss,  and  J.  Stephen  Prestwood 

February  1981 

The  study  investigated  the  joint  effects  of  knowledge  of  results  (KR  or  no-KR) , 
pacing  of  item  presentation  (computer  or  self-pacing),  and  type  of  testing 
strategy  (50-item  peaked  conventional,  variable-length  stradaptive,  or  50-item 
fixed-length  stradaptive  test)  on  ability  test  performance,  test  item  response 
latency,  information,  and  psychological  reactions  to  testing.  The  psychological 
reactions  to  testing  were  obtained  from  Likert-type  items  that  assessed  test¬ 
taking  anxiety,  motivation,  perception  of  difficulty,  and  reactions  to  knowledge 
of  results.  Data  were  obtained  from  447  college  students  randomly  assigned  to 
one  of  the  12  experimental  conditions. 

The  results  indicated  that  there  were  no  effects  on  ability  estimates  due  to 
knowledge  of  results,  testing  strategy,  or  pacing  of  item  presentation.  Al¬ 
though  average  latencies  were  greater  on  the  stradaptive  tests  than  on  the  con¬ 
ventional  test,  the  overall  testing  time  was  not  substantially  longer  on  the 
adaptive  tests  and  may  have  been  a  function  of  differences  in  test  difficulty. 
Analysis  of  information  values  Indicated  higher  levels  of  information  on  the 
stradaptive  tests  than  on  the  conventional  test.  There  was  no  statistically 
significant  main  effect  for  any  of  the  three  experimental  conditions  when  test 
anxiety  or  test-taking  motivation  were  the  dependent  variables,  although  there 
were  some  significant  Interaction  effects. 


The  concurrent  validity  analysis  showed  that  the  conventional  test  produced 
ability  level  estimates  that  correlated  more  highly  with  the  criterion  test 
scores  than  did  the  Bayesian  test  for  all  lengths  greater  than  four  items.  This 
result  was  observed  for  both  scoring  procedures  used  with  the  conventional  test. 

Limitations  of  the  study,  and  the  conclusions  that  may  be  drawn  from  it,  are 
discussed.  These  limitations,  which  may  have  affected  the  results  of  this 
study,  incLuded  possible  differences  in  the  alternate  forms  used  within  the  two 
testing  strategies,  the  relatively  small  calibration  samples  used  to  estimate 
the  ICC  parameters  for  the  items  used  in  the  study,  and  method  variance  in  the 
conventional  tests.  (AD  A094477) 


Research  Report  81-1 
Review  of  Test  Theory  and  Methods 
David  J.  Weiss  and  Mark  L.  Davison 
January  1981 

The  research  literature  on  test  theory  and  methods  for  the  period  1975  through 
early  1980  is  critically  reviewed.  Research  on  classical  test  theory  has  con¬ 
centrated  on  relatively  unimportant  developments  in  reliability  theory,  with 
some  new  developments  and  applications  of  generalizabill ty  theory  appearing  dur¬ 
ing  this  period.  The  reliability  of  change  or  gain  scores  has  received  some 
attention  from  the  classical  test  theory  perspective,  as  have  the  applications 
of  classical  reliability  concepts  in  experimental  design  and  the  analysis  of 
experimental  data.  A  minor  amount  of  research  with  classical  models  was  in  the 
area  of  test-score  equating.  Classical  item  analysis  procedures,  however,  re¬ 
ceived  little  attention.  A  fair  amount  of  research  during  the  period  was  devot¬ 
ed  to  different  item  types  and  test  item  response  modes  as  replacements  for  the 
ubiquitous  multiple-choice  item.  Several  types  of  true-false  items  were  pro¬ 
posed,  and  formula  scoring  was  studied  by  a  number  of  researchers  in  an  attempt 
to  reduce  guessing  effects.  The  perennial  topic  of  response  option  weighting 
received  attention,  with  efforts  oriented  toward  demonstrating  effects  on  valid¬ 
ity  and  reliability.  Response  modes  studied  included  answer-until-correct ,  con¬ 
fidence  weighting,  and  free-response. 

A  number  of  alternatives  to  classical  test  theory  were  studied  in  an  attempt  to 
solve  some  of  the  problems  for  which  classical  test  theory  has  proven  to  be 
inadequate.  Research  on  criterion-referenced  testing  continued  during  this  pe¬ 
riod.  Latent  trait  test  theory  (item  response  theory,  or  IRT)  received  consid¬ 
erable  attention.  Research  on  the  1-parameter  IRT  model  continued  to  address 
problems  of  parameter  estimation,  model  fit,  and  equating.  The  question  of  the 
person-free  and  sample-free  characteristics  of  this  model  (i.e.,  its  robustness) 
were  investigated,  with  results  generally  supporting  these  desirable  character¬ 
istics.  In  addition,  a  special  case  of  this  model  that  can  account  for  guessing 
was  developed,  and  the  model  was  generalized  and  successfully  applied  to  poly- 
chotomous  attitude  types  of  items.  Considerable  research  occurred  on  the  2-  and 
3-parameter  IRT  models.  The  concept  of  Information  as  a  replacement  for  classi¬ 
cal  reliability  concepts  was  studied,  and  its  uses  in  developing  parallel  tests 
were  described*  As  with  the  1-pararaeter  IRT  model,  problems  of  parameter  esti¬ 
mation  and  equating  were  investigated.  These  IRT  models  were  successfully  ap- 


12 


Research  Report  80-4 

A  Comparison  of  Adaptive,  Sequential ,  and  Conventional  Testing 
Strategies  for  Mastery  Decisions 
G.  Gage  Kingsbury  and  David  J.  Weiss 
November  1980 

Two  procedures  for  making  mastery  decisions  with  variable  length  tests  and  a 
conventional  mastery  testing  procedure  were  compared  in  monte  carlo  simulation. 
The  simulation  varied  the  characteristics  of  the  item  pool  used  for  testing  and 
the  maximum  test  length  allowed.  The  procedures  were  compared  in  terms  of  the 
mean  test  length  needed  to  make  a  decision,  the  validity  of  the  decisions  made 
by  each  procedure,  and  the  types  of  classification  errors  made  by  each  proce¬ 
dure.  Both  of  the  variable  test  length  procedures  were  found  to  result  in  im¬ 
portant  reductions  in  mean  test  length  from  the  conventional  test  length.  The 
Sequential  Probability  Ratio  Test  (SPRT)  procedure  resulted  in  greater  test 
Length  reductions,  on  the  average,  than  the  Adaptive  Mastery  Testing  (AMT)  pro¬ 
cedure.  However,  the  AMT  procedure  resul  ed  both  in  more  valid  mastery  deci¬ 
sions  and  in  more  balanced  error  rates  than  the  SPRT  procedure  under  all  condi¬ 
tions.  In  addition,  the  AMT  procedure  produced  the  best  combination  of  test 
length  and  validity.  (AD  A094478) 


Research  Report  80-5 

An  Alternate-Forms  Reliability  and  Concurrent  Validity 
Comparison  of  Bayesian  Adaptive  and  Conventional  Ability  Tests 
G.  Gage  Kingsbury  and  David  J.  Weiss 
December  1980 

Two  30-item  alternate  forms  of  a  conventional  test  and  a  Bayesian  adaptive  test 
were  administered  by  computer  to  472  undergraduate  psychology  students.  In 
addition,  each  student  completed  a  120-item  paper-and-pencil  test,  which  served 
as  a  concurrent  validity  criterion  test,  and  a  series  of  very  easy  questions 
designed  to  detect  students  who  were  not  answering  conscientiously.  All  test 
items  were  five-alternative  multiple-choice  vocabulary  items.  Reliability  and 
concurrent  validity  of  the  two  testing  strategies  were  evaluated  after  the  ad¬ 
ministration  of  each  item  for  each  of  the  tests,  so  that  trends  indicating  dif¬ 
ferences  in  the  testing  strategies  as  a  function  of  test  length  could  be  detect¬ 
ed.  For  each  test,  additional  analyses  were  conducted  to  determine  whether  the 
two  forms  of  the  test  were  operationally  alternate  forms. 

Results  of  the  analysis  of  alternate-forms  correspondence  Indicated  that  for  all 
test  lengths  greater  than  10  items,  each  of  the  alternate  forms  for  the  two  test 
types  resulted  in  fairly  constant  mean  ability  level  estimates.  When  the  scor¬ 
ing  procedure  was  equated,  the  mean  ability  levels  estimated  from  the  two  forms 
of  the  conventional  test  differed  to  a  greater  extent  than  those  estimated  from 
the  two  forms  of  the  Bayesian  adaptive  test. 

The  alternate-forms  reliability  analysis  indicated  that  the  two  forms  of  the 
Bayesian  test  resulted  in  more  reliable  scores  than  the  two  forms  of  the  conven¬ 
tional  test  for  all  test  lengths  greater  than  two  Items.  This  result  was  ob¬ 
served  when  the  conventional  test  was  scored  either  by  the  Bayesian  or  propor¬ 
tion-correct  method. 


11 


ABSTRACTS  OF  RESEARCH  REPORTS 


Research  Report  79-6 

Efficiency  of  an  Adaptive  Inter-Subtest  Branching  Strategy 
in  the  Measurement  of  Classroom  Achievement 
Kathleen  A,  Gialiuca  and  David  J.  Weiss 
November  1979 

A  real-data  simulation  was  conducted  to  investigate  the  efficiency  of  an  adap¬ 
tive  testing  strategy  designed  for  achievement  test  batteries  applied  to  a 
classroom  achievement  test.  This  testing  strategy  combined  adaptive  item  selec¬ 
tion  routines  both  within  and  between  the  subtests  of  the  test  battery.  Compar¬ 
isons  were  made  between  the  conventionally-administered  tests  and  the  simulated 
adaptive  tests  in  terms  of  test  length,  psychometric  information,  and  correla¬ 
tions  of  achievement  estimates.  Design  of  the  study  also  permitted  (1)  separa¬ 
tion  of  the  effects  of  the  adaptive  intra-subtest  item  selection  procedure  and 
inter-subtest  branching,  (2)  evaluation  of  the  effects  of  different  intra-sub¬ 
test  termination  criteria,  (3)  use  of  classical  regression  equations  and  regres¬ 
sion  equations  corrected  for  errors  of  measurement  in  the  predictors,  and  (4) 
cross-validation  stability  of  the  inter-subtest  branching  regression  predic¬ 
tions.  Data  consisted  of  the  responses  from  1,600  students  to  classroom-admin¬ 
istered  final  exams  in  a  general  biology  course  at  the  University  of  Minnesota. 

Total  test  length  was  reduced  from  16%  to  30%  using  the  adaptive  intra-subtest 
item  selection  strategy  with  a  variable  termination  criterion  that  omits  those 
items  providing  little  information  to  the  measurement  process.  Subtest-length 
reductions  ranged  from  about  8%  to  62%.  Total  test  length  was  reduced  another 
1%  to  5%  (with  subtest-length  reductions  of  up  to  53%)  upon  the  addition  of  an 
inter-subtest  branching  strategy  that  utilized  regression  equations  with  prior 
information  concerning  a  student's,  performance. 

Reductions  in  subtest  length  were  accomplished  with  virtually  no  loss  in  psycho¬ 
metric  information.  Correlations  between  the  Bayesian  achievement  estimates 
from  the  adaptive  and  conventional  tests  were  uniformly  high,  typically  r^  *  .90 
and  higher.  Results  showed  that  the  use  of  the  corrected  regression  equations 
did  little  to  improve  the  performance  of  the  inter-subtest  branching;  although 
the  multiple  correlations  for  the  corrected  equations  were  higher,  both  the  in¬ 
formation  curves  and  correlations  of  achievement  estimates  were  generally  lower. 
Cross-validation  results  indicated  that  the  procedure  can  be  used  in  different 
samples  from  the  same  population. 

Results  from  this  study  generally  supported  the  generality  of  this  adaptive 
testing  strategy  for  reducing  achievement  test  length  with  no  adverse  impact  on 
the  quality  of  the  measurements.  Suggestions  are  made  for  further  research  with 
this  testing  strategy.  (AD  A080956) 


Perceptions  of  test  difficulty  were  different  for  adaptive  and  conventional 
tests;  students  accurately  perceived  the  conventional  tests  as  either  too 
easy  or  too  difficult,  depending  on  their  ability  levels,  while  the  adap¬ 
tive  test  was  generally  accurately  perceived  as  being  of  appropriate  diffi¬ 
culty. 


i  L“ : v.  of.  Dr .  Gerhard  Fischer 
uisbiggasse  If T 

-  10:; 

AuS-RIA 

■  Dr  lessor  Denali  Fitzgerald 
u'ii  vers- c*  New  Engl  arc 
Arm:?.l£.  file*  South  wales  233 1 
-U5TfinL!4 

!  >.  Dexter  ►!rtchs.’ 
i.-- :  vsr si rv  of  Cre;n 
De;  irtiTie"  t  :*  CcK'-ter  Science 
tjgere,  OR  ,;7*I'3 

1  5r.  Janice  Si-crd 
Levers* tv  o**  Massachusetts 
S:noc!  :>  Education 
Aaner,:,  M4  0  i  Oil 

1  It.  Robert  6 laser 

.earning  Re  search  l  ffsifthpierit  Center 
University  cl  Pittsburgh 
393?  D'rtara  Street 
FITTSBiJR6H,  FA  15260 

1  Dr.  Bert  Srsen 
Johrs  HopKms  university 
Deparrert  cf  Psychology 
Charles  h  34th  Street 
Baitmor*,  MD  21218 

1  Dipl  Fad.  Michael  N.  Habon 
Jniversitat  DLSselcorf 
Er: iehungswasersnattlicfiea  Ir.sc.  12 
Jn i . 5r=.tatsstr,  * 

D-40CV  Dusseldarf  1 
dE5T  GERMANY 

1  Dr.  Pen  Haafcieton 
School  of  Education 
University  of  Massachusetts 
Asher st,  MA  01002 

1  Dr.  Del wy n  Harnisch 
University  of  Illinois 
51  Serty  Drive 
Cha«paignf  11  61820 

!  Prof.  Lutz  F.  Hornfce 
Uni versi tat  Dusseldorf 
Erziehungseissenschaftliches  Inst.  11 
Ufiiversitatsstr.  1 
Diisseldorf  1 
NEST  GERMANY 


!  Dr.  Paul  Horst 
677  s  Street,  i 13 ‘ 

Chula  Vista,  CA  9CC10 

1  Dr,  Libya  Huephre/s 
"epartaent  of  Psychology 
University  of  Illinois 
603  East  Daniel  Street 
Chamigr.,  It  61322 

!  Dr.  Steven  Hur.l'fl 
Department  c 1  Education 
University  of  Albert a 
Ecfsontan,  Alberta 
CANADA 

!  Sr,  JacL  Hurter 
2:22  Ccohdga  St. 

Lansing,  MI  48906* 

1  Dr.  Huynh  Muyr.h 
College  :f  Education 
Uni ,-ersi tv  of  Scuth  Csr jiira 
Columbia.  SC  29208 

1  Dr.  Douglas  H,  Jones 
Advanced  Statistical  technologies 
Corporation 
!C*  Trafalgar  Court 
uaNrenceville,  NJ  08146 

1  Professor  John  A.  Keats 
Department  of  Psychology 
The  University  of  Newcastle 
N.S.N.  2308 
AUSTRALIA 

I  Dr.  Willi a a  Koch 
University  of  Texas-Austin 
Measurement  and  Evaluation  Center 
Austin.  7X  78703 

1  Dr.  Thomas  Leonard 
University  of  Wisconsin 
Department  of  Statistics 
1210  Nest  Dayton  Street 
Madison,  NJ  53705 

1  Dr.  Alan  Lesgold 
Learning  P&D  Center 
University  of  Pittsburgh 
3939  O’Hara  Street 
Pittsburgh,  PA  13260 


1  Dr.  Michael  Levine 
Department  of  Educational  fsychclcgv 
210  Education  Bldg. 

University  of  Illinois 
Chaapaign,  IL  cifiOI 

1  Dr,  Charles  .ems 
'acjiteit  Sociale  Neterschappe* 

Pi j^sur.iversiteit  clonings" 

Cude  Bctermcestraat  23 
^•TEC  Groningen 
Netherlands 

1  Dr.  Robert  lit 
College  o;  Educatior 
University  o;  Illinois 
Urbara.  IL  61801 

1  Dr.  Rooert  Lockaar 
Center  for  Nava!  Analysis 
200  Nertn  Beauregard  St. 

Alexandria,  VA  22311 

1  Dr.  P*eder:c  f.  uors 
Educational  Testing  Service 
Princeton,  NJ  085^1 

1  Dr.  Jaaes  Luctsden 
Department  of  Psychology 
University  of  Western  Australia 
Nediands  W.A.  6009 
AUSTRALIA 

1  Dr.  Gary  Marco 
Stop  31-E 

Educational  Testing  Se'rice 
Princeton,  NJ  08451 

1  Mr.  Robert  McKinley 
University  of  Toledo 
Dept  of  Educational  Fsycsolcgv 
Toledo,  OH  43606 

1  Dr.  Barbara  Means 
Huaan  Resources  Research  Organization 
300  North  Washington 
Alexandria,  VA  22314 

1  Dr.  Robert  Hislevy 
Educational  Testing  Service 
Princeton,  NJ  08541 

1  Dr.  N,  Alan  Nicewander 
University  of  Oklihoea 
Department  of  Psychol og> 

Oklahoma  City,  OK  73069 


:  y.  le'.  ir  fl  nz/izi.  »  fr.  tollut  £:ai 

3fi  '-I'iijw'ist  Cents'  for  *?iSL'i-ett  . enter  fc'  AsiJ ysi s 
■p.  ;er3.‘.e  o'  Iowa  I?:  Ho.-th  lea.ireQS'j  Street 

limi  City,  ft  512*:  Ale.-. inetr:  a,  Vfc  123'.  t 


*  L**.  Ja.7?5  u.='f: 

:-r. 

;?'*  5 : u * r  Scat*  EW?: 
Cre*;.  „*  3^2" 


M.  r’ alienee 

American  LOuDCll  CT,  EjLCatlon 

5 EE  Tesurc  Ser*.:cs»  Suits  10 
C re  Duoort  E  !*■]?,  NW 
Wish  mg  tor,  PC  20:36 


;  0r .  K.  wai i ate  Sirens 
f  Sner.or 

Manpower  Fesecir:h  a~b  A-viscr,  Services 
Smithsonian  Irsiiiuti-r* 

SOI  North  f; tt  Street 
rtie.-.ar-sfri a,  VA  213- * 

l  Martha  Slocung 
Ed-acatioral  Testing  Service 
Princeton.  4iJ  03!^! 


I-*.  Eaits  rauis:r 
Dept.  :t  ?sych:i:a/ 

?  r ;  *  i  n  c  St  at  a  2r, ; .  & 3- 53 1 v 


Dr.  ?eter  Stclorf 
Center  fcr  Nave!  Analysis 
200  North.  Beauregard  Street 
Alexandria,  VA  22311 


«■",  ^arl  j,  RecKase 

r  ~  r,  -  ♦  ^ 

r.  tC.\  U2 
T  c«%a  C ;  t  ■  j  l  A  c-2243 


i  Or.  Willi  at  Stout 
university  of  Illinois 
Depart-Ter-t  of  Macbeaatics 
Jrbar.a«  II  61801 


2r.  Lawrence  P.udner 

402  Elm  Avenue 

Tar  Dina  MS  20012 

Or.  J,  Rvan 

Department  of  Education 
University  c*  South  Carolina 
Columbia.  SC  2q2CB 

PROF.  Puiiro  5AMEJIMA 
DE?V  OF  PSYCHOLOGY 
ENIVERSir*/  OF  TENNESSEE 
kNCKJUE.  7H  Z^lt 

prank  l.  Scheldt 
Department  cf  Psycnolog  -J 
Sldg.  65 

George  bashing  ton  !jis  j  hersa  t  y 
Washington,  DC  2-052 


Lowell  $chcer 

Psychological  \  Quantitative 
FcjndatiCns 
College  c*  Education 
Ihnersitv  of  Icwa 
Iowa  City,  IA  52242 


Lr.  vbi  uo  Shigeeasu 
'-4-24  r  uger/ued'Kaiga'* 

FjjtSawj  i5* 

I  A*  AN 


1  Dr.  rtarihsran  S»a*iriatrar: 

Laboratory  of  Psychometric  art 
Evaluation  Research 
School  of  Education 
Uni  versa  tv  of  Massachusetts 
Amherst,  N4  CJOC3 

1  Dr.  l-ikurfii  Tatsuo'a 
Computer  Eased  Education  Research  Ut 
252  E'.gineenng  Research  Laboratory 
Urtana.  IL  6180: 

1  Or,  Maurice  Tstsucka 
220  Education  Bloc 
iZlO  5.  Si *th  St.  * 

Champaign.  IL  a 3 820 

:  Dr.  Da-ad  kisser. 

Department  of  Psychology 
University  of  Kansas 
Lawrence,  IS  66044 


1  Mr.  Garv  Thomson 
Jnirers; tv  0*  Illinois 
Department  of  Educational  Psychology 
Champaign,  IL  61820 


I  Dr.  Robert  Tsutaf:awa 
Department  of  Statistics 
University  of  Missouri 
Columbia,  *0  63201 


1  Dr.  Led^ard  Tu:«»r 
iii;  ver  si  tv  of  .T:i:‘5J5 
*/eoar triient  of  FsycriCfiOgv 
602  E.  Bans;  Smse: 

Chaapa.gn,  IL  6:;2v 

I  Dr.  Dana  Vais 
Assessment  3vst?fs  Corporation 
2.23  University  Avenue 
Suite  ::o 

St.  Raul,  Mi  55';;4 

1  Dr.  Hcwa^o  ka: ter 
Divisij'  of  Psrchulogccai  StLiies 
Educational  Ttbting  Service 
Princeton,  Nj  08340 

!  Dr.  Ning-JId  Wang 
Lirdguist  Eerier  +or  Measjrsr.p': 

'Jni ve-sity  ot  low 2 
Iowa  u j tv  ,  1A  522 fi 2 

1  Dr,  Pr:ar  Waters 
HunRPO 

300  North  Washington 

Altsarcria,  y:A  223*4 

1  Dr.  Rand  R.  Mi ] cca 
University  of  Southern  California 
Department  of  Psychology 
Los  Angeles,  CA  9000? 

I  Geraan  Military  Representative 
ATTN;  Wolfgang  Nil  degree 
Strejtfcraefteafit 
D-5300  Bonn  2 

4000  Brandywine  3treetf  NW 
Washington  .  DC  20016 

1  Dr.  Bruce  llillians 
Department  rf  Educational  Psycnalagy 
University  of  Illinois 
Uoana,  IL  61801 

1  Ms.  Marilyn  Wingerskv 
Educational  Testing  Service 
Princeton,  NO  0854! 

i  Dr.  George  Wong 
Biostatistics  Laboratory 
Memorial  GioarrKettering  Cancer  Certer 
1275  York  Avenue 
New  York,  NY  10021 

1  Dr.  Werdy  Yen 
CTMcfira*  Niil 
Del  Monte  Research  Par k 
Monterey,  CA  ^3940 


# 


Previous  Publications  (continued) 


78-3.  A  Comparison  of  Levels  and  Dimensions  of  Performance  in  Black  and  White 
Groups 

78-2*  The  Effects  of  Knowledge  of  Results  and  Test  Difficulty  on  Ability  Test 
Performance  and  Psychological  Reactions  to  Testing.  September  1978. 

78-1.  A  Comparison  of  the  Fairness  of  Adaptive  and  Conventional  Testing 
Strategies.  August  1978. 

77-7.  An  Information  Comparison  of  Conventional  and  Adaptive  Tests  in  the 
Measurement  of  Classroom  Achievement.  October  1977. 

77-b.  An  Adaptive  Testing  Strategy  for  Achievement  Test  Batteries.  October  1977. 

77-5.  Calibration  of  an  Item  Pool  for  the  Adaptive  Measurement  of  Achievement. 
September  1977. 

77-4.  A  Rapid  Item-Search  Procedure  for  Bayesian  Adaptive  Testing.  May  1977. 

77-3.  Accuracy  of  Perceived  Test-Item  Difficulties.  May  1977. 

77-2.  A  Comparison  of  Information  Functions  of  Multiple-Choice  and  Free-Response 
Vocabulary  Items.  April  1977. 

77-1.  Applications  of  Computerized  Adaptive  Testing.  March  1977. 

Final  Report:  Computerized  Ability  Testing,  1972-1975.  April  1976. 

76-5.  Effects  of  Item  Characteristics  on  Test  Fairness.  December  1976. 

76-4.  Psychological  Effects  of  Immediate  Knowledge  of  Results  and  Adaptive 
Ability  Testing.  June  1976. 

76-3.  Effects  of  Immediate  Knowledge  of  Results  and  Adaptive  Testing  on  Ability 
Test  Performance.  June  1976. 

76-2.  Effects  of  Time  Limits  on  Test-Taking  Behavior.  April  1976. 

76-1.  Some  Properties  of  a  Bayesian  Adaptive  Ability  Testing  Strategy.  March 
1976. 

75-6.  A  Simulation  Study  of  Stradaptive  Ability  Testing.  December  1975. 

75-5.  Computerized  Adaptive  Trait  Measurement:  Problems  and  Prospects.  November 
1975. 

75-4.  A  Study  of  Computer-Administered  Stradaptive  Ability  Testing.  October 
1975. 

75-3.  Empirical  and  Simulation  Studies  of  Flexilevel  Ability  Testing*  July  1975. 

75-2.  TETREST:  A  FORTRAN  IV  Program  for  Calculating  Tetrachoric  Correlations. 
March  1975. 

75-1.  An  Empirical  Comparison  of  Two-Stage  and  Pyramidal  Adaptive  Ability 
Testing.  February  1975. 

74-5.  Strategies  of  Adaptive  Ability  Measurement.  December  1974. 

74-4.  Simulation  Studies  of  Two-Stage  Ability  Testing.  October  1974. 

74-3.  An  Empirical  Investigation  of  Computer-Administered  Pyramidal  Ability 
Testing.  July  1974. 

74-2.  A  Word  Knowledge  Item  Pool  for  Adaptive  Ability  Measurement.  June  1974. 

74-1.  A  Computer  Software  System  for  Adaptive  Ability  Measurement.  January  1974. 

73-4.  An  Empirical  Study  of  Computer-Administered  Two-Stage  Ability  Testing. 
October  1973. 

73-3.  The  Stratified  Adaptive  Computerized  Ability  Test.  September  1973. 

73-2.  Comparison  of  Four  Empirical  Item  Scoring  Procedures.  August  1973. 

73-1.  Ability  Measurement:  Conventional  or  Adaptive?  February  1973. 

Copies  of  these  reports  are  available,  while  supplies  last,  from: 

Computerized  Adaptive  Testing  Laboratory 
N660  Elliott  Hall 
University  of  Minnesota 
75  East  River  Road 
Minneapolis  MN  55455  U.S.A. 


Previous  Publications 


Proceedings  of  the  1982  Item  Response  Theory  and  Computerized  Adaptive 
Testing  Conference.  March  1985. 

Proceedings  of  the  1979  Computerized  Adaptive  Testing  Conference. 

September  1980 

Proceedings  of  the  1977  Computerized  Adaptive  Testing  Conference. 

July  1978. 

Research  Reports 

83-3.  Effect  of  Examinee  Certainty  on  Probabilistic  Test  Scores  and  a  Comparison 
of  Scoring  Methods  for  Probabilistic  Responses.  July  1983. 

83-2.  Bias  and  Information  of  Bayesian  Adaptive  Testing.  March  1983. 

83-1.  Reliability  and  Validity  of  Adaptive  and  Conventional  Tests  in  a  Military 
Recruit  Population.  January  1983. 

81-5.  Dimensionality  of  Measured  Achievement  Over  Time.  December  1981. 

81-4.  Factors  Influencing  the  Psychometric  Characteristics  of  an  Adaptive 
Testing  Strategy  for  Test  Batteries.  November  1981. 

81-3.  A  Validity  Comparison  of  Adaptive  and  Conventional  Strategies  for  Mastery 
Testing.  September  1981. 

Final  Report:  Computerized  Adaptive  Ability  Testing.  April  1981. 

81-2.  Effects  of  Immediate  Feedback  and  Pacing  of  Item  Presentation  on  Ability 

Test  Performance  and  Psychological  Reactions  to  Testing.  February  1981. 

81-1.  Review  of  Test  Theory  and  Methods.  January  1981. 

80-5.  An  Alternate-Forms  Reliability  and  Concurrent  Validity  Comparison  of 
Bayesian  Adaptive  and  Conventional  Ability  Tests.  December  1980. 

80-4.  A  Comparison  of  Adaptive,  Sequential,  and  Conventional  Testing  Strategies 
for  Mastery  Decisions.  November  1980. 

80-3.  Criterion-Related  Validity  of  Adaptive  Testing  Strategies.  June  1980. 

80-2.  Interactive  Computer  Administration  of  a  Spatial  Reasoning  Test.  April 
1980. 

Final  Report:  Computerized  Adaptive  Performance  Evaluation.  February  1980. 

80-1.  Effects  of  Immediate  Knowledge  of  Results  on  Achievement  Test  Performance 
and  Test  Dimensionality.  January  1980. 

79-7.  The  Person  Response  Curve:  Fit  of  Individuals  to  Item  Characteristic  Curve 
Models.  December  1979. 

79-6.  Efficiency  of  an  Adaptive  Inter-Subtest  Branching  Strategy  in  the 
Measurement  of  Classroom  Achievement.  November  1979. 

79-5.  An  Adaptive  Testing  Strategy  for  Mastery  Decisions.  September  1979. 

79-4.  Effect  of  Point-in-Time  in  Instruction  on  the  Measurement  of  Achievement. 
August  1979. 

79-3.  Relationships  among  Achievement  Level  Estimates  from  Three  Item 
Characteristic  Curve  Scoring  Methods.  April  1979. 

Final  Report:  Bias-Free  Computerized  Testing.  March  1979. 

79-2.  Effects  of  Computerized  Adaptive  Testing  on  Black  and  White  Students. 

March  1979. 

79-1.  Computer  Programs  for  Scoring  Test  Data  with  Item  Characteristic  Curve 
Models.  February  1979. 

78-5.  An  Item  Bias  Investigation  of  a  Standardized  Aptitude  Test.  December  1978. 

78-4.  A  Construct  Validation  of  Adaptive  Achievement  Testing.  November  1978.  on 
Tests  of  Vocabulary,  Mathematics,  and  Spatial  Ability.  October  L978. 


-continued  inside- 


